SlideShare a Scribd company logo
1 of 40
Hedera: Dynamic Flow
Scheduling for Data Center
Network
Mohammad Al-Fares, Sivasankar
Radhakrishnan, Barath Raghavan, Nelson
Huang, Amin Vahdat
- USENIX NSDI 2010 -
1
Presenter: Jason, Tsung-Cheng, HOU
Advisor: Wanjiun Liao
Dec. 22nd, 2011
Problem
• Relying on multipathing, due to…
– Limited port densities of
routers/switches
– Horizontal expansion
• Multi-rooted tree topologies
– Example: Fat-tree / Clos
2
Problem
• BW demand is essential and volatile
– Must route among multiple paths
– Avoid bottlenecks and deliver aggre. BW
• However, current multipath routing…
– Mostly: flow-hash-based ECMP
– Static and oblivious to link-utilization
– Causes long-term large-flow collisions
• Inefficiently utilizing path diversity
– Need a protocol or a scheduler
3
Collisions of elephant flows
• Collisions in two ways: Upward or Downward
D1S1 D2S2 D3S3 D4S4
Equal Cost Paths
• Many equal cost paths going up to the core
switches
• Only one path down from each core switch
• Need to find good flow-to-core mapping
DS
Goal
• Given a dynamic flow demands
– Need to find paths that maximize
network bisection BW
– No end hosts modifications
• However, local switch information is
unable to find proper allocation
– Need a central scheduler
– Must use commodity Ethernet switches
– OpenFlow
6
Architecture
• Detect Large Flows
– Flows that need bandwidth but are network-limited
• Estimate Flow Demands
– Use min-max fairness to allocate flows between SD
pairs
• Allocate Flows
– Use estimated demands to heuristically find better
placement of large flows on the EC paths
– Arrange switches and iterate again
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Architecture
• Feedback loop
• Optimize achievable bisection BW by
assigning flow-to-core mappings
• Heuristics of flow demand estimation and
placement
• Central Scheduler
– Global knowledge of all links in the network
– Control tables of all switches (OpenFlow)
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Elephant Detection
9
Elephant Detection
• Scheduler polls edge switches
– Flows exceeding threshold are “large”
– 10% of hosts’ link capacity (> 100Mbps)
• Small flows: Default ECMP hashing
• Hedera complements ECMP
– Default forwarding is ECMP
– Only schedules large flows contributing
to bisection BW bottlenecks
• Centralized functions: the essentials
10
Demand Estimation
11
Demand Estimation
• Current flow rate: misleading
– May be already constrained by network
• Need to find flow’s “natural” BW
demand when not limited by network
– As if only limited by NIC of S or D
• Allocate S/D capacity among flows
using max-min fairness
• Equals to BW allocation of optimal
routing, input to placement algorithm
12
Demand Estimation
• Given pairs of large flows, modify
each flow size at S/D iteratively
– S distributes unconv. BW among flows
– R limited: redistributes BW among
excessive-demand flows
– Repeat until all flows converge
• Guaranteed to converge in O(|F|)
– Linear to no. of flows
13
Demand Estimation
A
B
C
X
Y
Flow Estimate Conv. ?
AX
AY
BY
CY
Sender
Available
Unconv. BW
Flows Share
A 1 2 1/2
B 1 1 1
C 1 1 1
Senders
Demand Estimation
Recv RL?
Non-SL
Flows
Share
X No - -
Y Yes 3 1/3
Receivers
Flow Estimate Conv. ?
AX 1/2
AY 1/2
BY 1
CY 1
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 1/2
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Sender
Available
Unconv. BW
Flows Share
A 2/3 1 2/3
B 0 0 0
C 0 0 0
Senders
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 2/3 Yes
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Recv RL?
Non-SL
Flows
Share
X No - -
Y No - -
Receivers
A
B
C
X
Y
Placement Heuristics
18
Placement Heuristics
• Find a good large-flow-to-core mapping
– such that average bisection BW is maximized
• Two approaches
• Global First Fit: Greedily choose path that
has sufficient unreserved BW
– O([ports/switch]2)
• Simulated Annealing: Iteratively find a
globally better mapping of paths to flows
– O(# flows)
Global First-Fit
• New flow found, linearly search all paths from SD
• Place on first path with links can fit the flow
• Once flow ends, entries + reservations time out
?
Flow A
Flow B
Flow C
? ?
0 1 2 3
Scheduler
S D
Simulated Annealing
• Annealing: letting metal to cool down
and get better crystal structure
– Heating up to enter higher energy state
– Cooling to lower energy state with a
better structure and stopping at a temp
• Simulated Annealing:
– Search neighborhood for possible states
– Probabilistically accepting worse state
– Accepting better state, settle gradually
– Avoid local minima 21
Simulated Annealing
• State / State Space
– Possible solutions
• Energy
– Objective
• Neighborhood
– Other options
• Boltzman’s Function
– Prob. to higher state
• Control Temperature
– Current temp. affect
prob. to higher state
• Cooling Schedule
– How temp. falls
• Stopping Criterion
22
)/(1)( tEEP
Simulated Annealing
• State Space:
– All possible large-flow-to-core mappings
– However, same destinations map to same core
– Reduce state space, as long as not too many
large flows and proper threshold
• Neighborhood:
– Swap cores for two hosts within same pod,
attached to same edge / aggregate
– Avoids local minima
23
Simulated Annealing
• Energy:
– Estimated demand of flows
– Total exceeded BW capacity of links, minimize
• Temperature: remaining iterations
• Probability:
• Final state is published to switches and
used as initial state for next round
• Incremental calculation of exceeded cap.
• No recalculation of all links, only new large
flows found and neighborhood swaps 24
Evaluation
25
Implementation
• 16 hosts, k=4 fat-tree data plane
– 20 switches: 4-port NetFPGAs / OpenFlow
– Parallel 48-port non-blocking Quanta switch
– 1 scheduler, OpenFlow control protocol
– Testbed: PortLand
26
Simulator
• k=32; 8,192 hosts
– Pack-level simulators not applicable
– 1Gbps for 8k hosts, takes 2.5x1011 pkts
• Model TCP flows
– TCP’s AIMD when constrained by topology
– Poisson arrival of flows
– No pkt size variations
– No bursty traffic
– No inter-flow dynamics
27
PortLand/OpenFlow, k=4
28
Simulator
29
Reactiveness
• Demand Estimation:
– 27K hosts, 250K flows, converges < 200ms
• Simulated Annealing:
– Asymptotically dependent on # of flows + #
iter., 50K flows and 1K iter.: 11ms
– Most of final bisection BW: few hundred iter.
• Scheduler control loop:
– Polling + Est. + SA = 145ms for 27K hosts
Comments
31
Comments
• Destine to same host, via same core
– May congest at cores, but how severe?
– Large flows to/from a host: <k/2
– No proof, no evaluation
• Decrease search space and runtime
– Scalable for per-flow basis? For large k?
• No protection for mice flows, RPCs
– Only assumes work well under ECMP
– No address when route with large flows
32
Comments
• Own flow-level simulator
– Aim to saturate network
– No flow number by different size
– Traffic generation: avg. flow size and arrival
rates (Poisson) with a mean
– Only above descriptions, no specific numbers
– Too ideal or not volatile enough?
– Avg. bisection BW, but real-time graphs?
• States that per-flow VLB = per-flow ECMP
– Does not compare with other options (VL2)
– No further elaboration
33
Comments
• Shared responsibility
– Controller only deals with critical situations
– Switches perform default measures
– Improves performance and saves time
– How to strike a balance?
– Adopt to different problems?
• Default multipath routing
– States problems of per-flow VLB and ECMP
– How about per-pkt? Author’s future work
– How to improve switches’ default actions?
34
Comments
• Critical controller actions
– Considers large flows degrade overall efficiency
– What are critical situations?
– How to detect and react?
– How to improve reactiveness and adaptability?
• Amin Vahdat’s lab
– Proposes fat-tree topology
– Develops PortLand L2 virtualization
– Hedera: enhances multipath performance
– Integrate all above
35
References
• M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for
Data Center Network”, USENIX NSDI 2010
• Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data
Center Networks”, UC Berkeley course CS 294
• M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data
Center Network”, USENIX NSDI 2010, slides
36
Supplement
37
Fault-Tolerance
• Link / Switch failure
– Use PortLand’s fault notification protocol
– Hedera routes around failed components
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Fault-Tolerance
• Scheduler failure
– Soft-state, not required for correctness
(connectivity)
– Switches fall back to ECMP
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Limitations
• Dynamic workloads,
large flow turnover
faster than control
loop
– Scheduler will be
continually chasing
the traffic matrix
• Need to include
penalty term for
unnecessary SA flow
re-assignmentsFlow Size
MatrixStability
StableUnstable
ECMP Hedera

More Related Content

What's hot

Advanced regression and model selection
Advanced regression and model selectionAdvanced regression and model selection
Advanced regression and model selectionAnkit Jain
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
SQL Server Alta disponibilidad en ambientes empresariales
SQL Server Alta disponibilidad en ambientes empresarialesSQL Server Alta disponibilidad en ambientes empresariales
SQL Server Alta disponibilidad en ambientes empresarialesEduardo Castro
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with DockerMariaDB plc
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMSkoolkampus
 
Machine Learning Summer School 2016
Machine Learning Summer School 2016Machine Learning Summer School 2016
Machine Learning Summer School 2016chris wiggins
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra ClustersInstaclustr
 
Data ingestion
Data ingestionData ingestion
Data ingestionnitheeshe2
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Praxitelis Nikolaos Kouroupetroglou
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptxssuser2624f71
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 

What's hot (20)

Advanced regression and model selection
Advanced regression and model selectionAdvanced regression and model selection
Advanced regression and model selection
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
SQL Server Alta disponibilidad en ambientes empresariales
SQL Server Alta disponibilidad en ambientes empresarialesSQL Server Alta disponibilidad en ambientes empresariales
SQL Server Alta disponibilidad en ambientes empresariales
 
Getting started with MariaDB with Docker
Getting started with MariaDB with DockerGetting started with MariaDB with Docker
Getting started with MariaDB with Docker
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMS
 
Machine Learning Summer School 2016
Machine Learning Summer School 2016Machine Learning Summer School 2016
Machine Learning Summer School 2016
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS ArchitectureDistributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
 
Data ingestion
Data ingestionData ingestion
Data ingestion
 
02 Data Mining
02 Data Mining02 Data Mining
02 Data Mining
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptx
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
DDBMS
DDBMSDDBMS
DDBMS
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingJason TC HOU (侯宗成)
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureAnkita Mahajan
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksJason TC HOU (侯宗成)
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.pptsumadi26
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc NetworksKishan Patel
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithmRoshan Karunarathna
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptPatrick Theuri
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptVimalMallick
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.pptalirezakgm
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layerNitesh Singh
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAPNIC
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptMekiPetitSeg
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptSmitNiks
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptssuser2cc0d4
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRAPNIC
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed networkGOKULKANNANMMECLECTC
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBRAPNIC
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN) (20)

Data Center Network Multipathing
Data Center Network MultipathingData Center Network Multipathing
Data Center Network Multipathing
 
Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious Routing
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc Networks
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithm
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt
 
Quality of service
Quality of serviceQuality of service
Quality of service
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layer
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBR
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed network
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBR
 

More from Jason TC HOU (侯宗成)

More from Jason TC HOU (侯宗成) (11)

A Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTVA Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTV
 
Triangulating Data to Drive Growth
Triangulating Data to Drive GrowthTriangulating Data to Drive Growth
Triangulating Data to Drive Growth
 
Design & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck SharingDesign & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck Sharing
 
文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA
 
Growth @ KKTV
Growth @ KKTVGrowth @ KKTV
Growth @ KKTV
 
Growth 的基石 用戶行為追蹤
Growth 的基石   用戶行為追蹤Growth 的基石   用戶行為追蹤
Growth 的基石 用戶行為追蹤
 
App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率
 
Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012
 
Software-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief IntroductionSoftware-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief Introduction
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
OpenStack Framework Introduction
OpenStack Framework IntroductionOpenStack Framework Introduction
OpenStack Framework Introduction
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

  • 1. Hedera: Dynamic Flow Scheduling for Data Center Network Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat - USENIX NSDI 2010 - 1 Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao Dec. 22nd, 2011
  • 2. Problem • Relying on multipathing, due to… – Limited port densities of routers/switches – Horizontal expansion • Multi-rooted tree topologies – Example: Fat-tree / Clos 2
  • 3. Problem • BW demand is essential and volatile – Must route among multiple paths – Avoid bottlenecks and deliver aggre. BW • However, current multipath routing… – Mostly: flow-hash-based ECMP – Static and oblivious to link-utilization – Causes long-term large-flow collisions • Inefficiently utilizing path diversity – Need a protocol or a scheduler 3
  • 4. Collisions of elephant flows • Collisions in two ways: Upward or Downward D1S1 D2S2 D3S3 D4S4
  • 5. Equal Cost Paths • Many equal cost paths going up to the core switches • Only one path down from each core switch • Need to find good flow-to-core mapping DS
  • 6. Goal • Given a dynamic flow demands – Need to find paths that maximize network bisection BW – No end hosts modifications • However, local switch information is unable to find proper allocation – Need a central scheduler – Must use commodity Ethernet switches – OpenFlow 6
  • 7. Architecture • Detect Large Flows – Flows that need bandwidth but are network-limited • Estimate Flow Demands – Use min-max fairness to allocate flows between SD pairs • Allocate Flows – Use estimated demands to heuristically find better placement of large flows on the EC paths – Arrange switches and iterate again Detect Large Flows Estimate Flow Demands Allocate Flows
  • 8. Architecture • Feedback loop • Optimize achievable bisection BW by assigning flow-to-core mappings • Heuristics of flow demand estimation and placement • Central Scheduler – Global knowledge of all links in the network – Control tables of all switches (OpenFlow) Detect Large Flows Estimate Flow Demands Allocate Flows
  • 10. Elephant Detection • Scheduler polls edge switches – Flows exceeding threshold are “large” – 10% of hosts’ link capacity (> 100Mbps) • Small flows: Default ECMP hashing • Hedera complements ECMP – Default forwarding is ECMP – Only schedules large flows contributing to bisection BW bottlenecks • Centralized functions: the essentials 10
  • 12. Demand Estimation • Current flow rate: misleading – May be already constrained by network • Need to find flow’s “natural” BW demand when not limited by network – As if only limited by NIC of S or D • Allocate S/D capacity among flows using max-min fairness • Equals to BW allocation of optimal routing, input to placement algorithm 12
  • 13. Demand Estimation • Given pairs of large flows, modify each flow size at S/D iteratively – S distributes unconv. BW among flows – R limited: redistributes BW among excessive-demand flows – Repeat until all flows converge • Guaranteed to converge in O(|F|) – Linear to no. of flows 13
  • 14. Demand Estimation A B C X Y Flow Estimate Conv. ? AX AY BY CY Sender Available Unconv. BW Flows Share A 1 2 1/2 B 1 1 1 C 1 1 1 Senders
  • 15. Demand Estimation Recv RL? Non-SL Flows Share X No - - Y Yes 3 1/3 Receivers Flow Estimate Conv. ? AX 1/2 AY 1/2 BY 1 CY 1 A B C X Y
  • 16. Demand Estimation Flow Estimate Conv. ? AX 1/2 AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Sender Available Unconv. BW Flows Share A 2/3 1 2/3 B 0 0 0 C 0 0 0 Senders A B C X Y
  • 17. Demand Estimation Flow Estimate Conv. ? AX 2/3 Yes AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Recv RL? Non-SL Flows Share X No - - Y No - - Receivers A B C X Y
  • 19. Placement Heuristics • Find a good large-flow-to-core mapping – such that average bisection BW is maximized • Two approaches • Global First Fit: Greedily choose path that has sufficient unreserved BW – O([ports/switch]2) • Simulated Annealing: Iteratively find a globally better mapping of paths to flows – O(# flows)
  • 20. Global First-Fit • New flow found, linearly search all paths from SD • Place on first path with links can fit the flow • Once flow ends, entries + reservations time out ? Flow A Flow B Flow C ? ? 0 1 2 3 Scheduler S D
  • 21. Simulated Annealing • Annealing: letting metal to cool down and get better crystal structure – Heating up to enter higher energy state – Cooling to lower energy state with a better structure and stopping at a temp • Simulated Annealing: – Search neighborhood for possible states – Probabilistically accepting worse state – Accepting better state, settle gradually – Avoid local minima 21
  • 22. Simulated Annealing • State / State Space – Possible solutions • Energy – Objective • Neighborhood – Other options • Boltzman’s Function – Prob. to higher state • Control Temperature – Current temp. affect prob. to higher state • Cooling Schedule – How temp. falls • Stopping Criterion 22 )/(1)( tEEP
  • 23. Simulated Annealing • State Space: – All possible large-flow-to-core mappings – However, same destinations map to same core – Reduce state space, as long as not too many large flows and proper threshold • Neighborhood: – Swap cores for two hosts within same pod, attached to same edge / aggregate – Avoids local minima 23
  • 24. Simulated Annealing • Energy: – Estimated demand of flows – Total exceeded BW capacity of links, minimize • Temperature: remaining iterations • Probability: • Final state is published to switches and used as initial state for next round • Incremental calculation of exceeded cap. • No recalculation of all links, only new large flows found and neighborhood swaps 24
  • 26. Implementation • 16 hosts, k=4 fat-tree data plane – 20 switches: 4-port NetFPGAs / OpenFlow – Parallel 48-port non-blocking Quanta switch – 1 scheduler, OpenFlow control protocol – Testbed: PortLand 26
  • 27. Simulator • k=32; 8,192 hosts – Pack-level simulators not applicable – 1Gbps for 8k hosts, takes 2.5x1011 pkts • Model TCP flows – TCP’s AIMD when constrained by topology – Poisson arrival of flows – No pkt size variations – No bursty traffic – No inter-flow dynamics 27
  • 30. Reactiveness • Demand Estimation: – 27K hosts, 250K flows, converges < 200ms • Simulated Annealing: – Asymptotically dependent on # of flows + # iter., 50K flows and 1K iter.: 11ms – Most of final bisection BW: few hundred iter. • Scheduler control loop: – Polling + Est. + SA = 145ms for 27K hosts
  • 32. Comments • Destine to same host, via same core – May congest at cores, but how severe? – Large flows to/from a host: <k/2 – No proof, no evaluation • Decrease search space and runtime – Scalable for per-flow basis? For large k? • No protection for mice flows, RPCs – Only assumes work well under ECMP – No address when route with large flows 32
  • 33. Comments • Own flow-level simulator – Aim to saturate network – No flow number by different size – Traffic generation: avg. flow size and arrival rates (Poisson) with a mean – Only above descriptions, no specific numbers – Too ideal or not volatile enough? – Avg. bisection BW, but real-time graphs? • States that per-flow VLB = per-flow ECMP – Does not compare with other options (VL2) – No further elaboration 33
  • 34. Comments • Shared responsibility – Controller only deals with critical situations – Switches perform default measures – Improves performance and saves time – How to strike a balance? – Adopt to different problems? • Default multipath routing – States problems of per-flow VLB and ECMP – How about per-pkt? Author’s future work – How to improve switches’ default actions? 34
  • 35. Comments • Critical controller actions – Considers large flows degrade overall efficiency – What are critical situations? – How to detect and react? – How to improve reactiveness and adaptability? • Amin Vahdat’s lab – Proposes fat-tree topology – Develops PortLand L2 virtualization – Hedera: enhances multipath performance – Integrate all above 35
  • 36. References • M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010 • Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data Center Networks”, UC Berkeley course CS 294 • M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010, slides 36
  • 38. Fault-Tolerance • Link / Switch failure – Use PortLand’s fault notification protocol – Hedera routes around failed components 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 39. Fault-Tolerance • Scheduler failure – Soft-state, not required for correctness (connectivity) – Switches fall back to ECMP 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 40. Limitations • Dynamic workloads, large flow turnover faster than control loop – Scheduler will be continually chasing the traffic matrix • Need to include penalty term for unnecessary SA flow re-assignmentsFlow Size MatrixStability StableUnstable ECMP Hedera