SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Introduction Off-Line Problem On-Line Problem Summary Appendix
Dynamic Fractional Resource Scheduling
for Cluster Platforms
Mark Stillwell
Department of Information and Computer Sciences
University of Hawai’i at M¯anoa
Achievement Rewards for College Scientists
2010 Scholarship Awards Program
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Clusters
Definition
A cluster is a group of independent computers, or nodes,
working together closely, usually connected by a high-speed
network.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Jobs
The system can accept user requests to run jobs or the
administrator can instantiate jobs directly
Running jobs are made up of nearly identical tasks
The number of tasks is specified by user/administrator
Tasks can block while communicating with each other
The assignment of resources to tasks is called scheduling
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Current Approaches
Service Hosting (Off-Line scheduling)
traditionally, dedicated machines
current interest in server consolidation
few good theoretical models
heavy “engineering” bias
High-Performance Computing (On-Line Scheduling)
Usually First-Come-First-Served (FCFS) with backfilling
Backfilling needs (unreliable) compute time estimates
Unbounded wait times
Inefficient use of nodes/resources
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Our Proposal
Use virtual machine technology.
Multiple tasks on one node
Performance isolation
Sharing of fractional resources
Define a run-time computable metric that captures notions
of performance and fairness.
Design heuristics that allocate resources to jobs while
explicitly trying to achieve high ratings by our metric.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Requirements, Needs, and Yield
Tasks have memory requirements and CPU needs
All tasks of a job have the same requirements and needs
For a task to be placed on a node there must be memory
available at least equal to its requirements
A task can be allocated less CPU than its need, and the
ratio of the allocation to the need is the yield
All tasks of a job must have the same yield, so we can also
speak of the yield of a job
The yield of a job gives its performance relative to if it were
run on a dedicated system
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Off-Line Problem Assumptions
Steady-state execution with infinite jobs
Models an ideal service hosting environment
Makes the problem more tractable [Marchal et al., 2006]
Good when job duration longer than schedule time
Jobs are serial (single task)
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Objective Function
The performance of a job is correlated to its yield
Maximizing the average or sum of the yields may be unfair
to some jobs
Instead, we seek to maximize the minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Task Placement Heuristics
Greedy Task Placement – Incremental, puts each task on
the node with the lowest computational load on which it
can fit without violating memory constraints
Randomized Rounding Task Placement – Based on
relaxing an MILP to an LP and rounding probabilistically
MCB Task Placement – Global, iteratively applies
multi-capacity (vector) bin-packing heuristics during a
binary search for the maximized minimum yield
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Large Problem Set: Minimum Yield vs. Free Memory
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.3
0.35
0.4
0.45
0.5
0.55
Slack
MinimumYield
bound
MCB8
GR
GB
SG
SGB
RRND
RRNZ
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
The problem is tractable
Multi-capacity bin packing algorithms perform well
Greedy algorithms perform okay, and are very fast
Randomized rounding approaches are not very promising
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
On-Line Problem Assumptions
Job arrival/completion times are not known in advance
Jobs are temporary
The user wants a final result
Quick turnaround relative to runtime is desired
Jobs are not interactive
So can wait until resources are available to start
We avoid the use of runtime estimates
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Stretch
Our goal: minimize maximum stretch (aka slowdown)
Stretch: the time a job spends in the system divided by the
time that would be spent in a dedicated
system [Bender et al., 1998]
Popular to quantify schedule quality post-mortem
Not generally used to make scheduling decisions
Runtime computation requires (unreliable) user estimates.
Minimizing average stretch prone to starvation
Minimizing maximum stretch captures notions of both
performance and fairness.
Our approach: try to maximize minimum yield
Similar, but not the same, as minimizing maximum stretch
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Max Stretch Degradation vs. Load, no migration cost
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
StretchDegradationFactor
Load
EASY
FCFS
GREEDY
GREEDYP
GREEDYPM
DynMCB8
MCB8P 600
MCB8PG 600
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Conclusions
DFRS approaches can significantly outperform traditional
approaches
Aggressive repacking can lead to much better resource
allocations
But also to heavy migration costs
Greedy migration is not that useful
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
Summary
We have proposed a novel approach to job scheduling on
clusters, Dynamic Fractional Resource Scheduling, that
makes use of modern virtual machine technology and
seeks to optimize a runtime-computable, user-centric
measure of performance called the minimum yield
Multi-capacity bin packing heuristics can be used to find
good solutions
Our approach avoids the use of unreliable runtime
estimates
This approach has the potential to lead to
order-of-magnitude improvements in performance over
current technology
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References I
Bender, M. A., Chakrabarti, S., and Muthukrishnan, S.
(1998).
Flow and Stretch Metrics for Scheduling Continuous Job
Streams.
In Proc. of the 9th ACM-SIAM Symp. On Discrete
Algorithms, pages 270–279.
Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006).
Steady-state scheduling of multiple divisible load
applications on wide-area distributed computing platforms.
Intl. J. of High Performance Computing Applications,
20(3):365–381.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters
Introduction Off-Line Problem On-Line Problem Summary Appendix
References II
Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova,
H. (2009).
Resource Allocation using Virtual Clusters.
In CCGrid, pages 260–267. IEEE.
Stillwell, M., Vivien, F., and Casanova, H. (2010).
Dynamic fractional resource scheduling for HPC workloads.
In IPDPS.
to appear.
Mark Stillwell UH M¯anoa ICS
DFRS for Clusters

Weitere ähnliche Inhalte

Andere mochten auch

ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版Naoki Kitamura
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersMark Stillwell
 
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, LyonDynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, LyonMark Stillwell
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Mark Stillwell
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerMark Stillwell
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerMark Stillwell
 

Andere mochten auch (9)

Main
MainMain
Main
 
ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版ギークハウスエチオピアProject 一部削除版
ギークハウスエチオピアProject 一部削除版
 
Resource Allocation using Virtual Clusters
Resource Allocation using Virtual ClustersResource Allocation using Virtual Clusters
Resource Allocation using Virtual Clusters
 
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, LyonDynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon
 
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
Dynamic Fractional Resource Scheduling Practical Issues and Future Directions...
 
H2C
H2CH2C
H2C
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
DevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and dockerDevOps introduction with ansible, vagrant, and docker
DevOps introduction with ansible, vagrant, and docker
 
HARNESS project Demo
HARNESS project DemoHARNESS project Demo
HARNESS project Demo
 

Ähnlich wie Dynamic Fractional Resource Scheduling -- 2010, ARCS

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerAlexDepo
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyDonna Guazzaloca-Zehl
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...Finalyear Projects
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud ComputingAn Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud ComputingIRJET Journal
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM Events
 
CirrusDB Cloud Effect
CirrusDB Cloud EffectCirrusDB Cloud Effect
CirrusDB Cloud EffectAshok Sami
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliencyMasashi Narumoto
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons LearnedVMworld
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemJPINFOTECH JAYAPRAKASH
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data SharingSurekha Parekh
 
High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]Michael Hudak
 
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl osIEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl osIEEEFINALYEARSTUDENTPROJECTS
 

Ähnlich wie Dynamic Fractional Resource Scheduling -- 2010, ARCS (20)

Virtualization Business Case
Virtualization Business CaseVirtualization Business Case
Virtualization Business Case
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud ComputingAn Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
An Optimized-Throttled Algorithm for Distributing Load in Cloud Computing
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
 
CirrusDB Cloud Effect
CirrusDB Cloud EffectCirrusDB Cloud Effect
CirrusDB Cloud Effect
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons LearnedVMworld 2013: SDDC IT Operations Transformation:  Multi-customer Lessons Learned
VMworld 2013: SDDC IT Operations Transformation: Multi-customer Lessons Learned
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
DB2 for z/O S Data Sharing
DB2 for z/O S  Data  SharingDB2 for z/O S  Data  Sharing
DB2 for z/O S Data Sharing
 
High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]High Availability Disaster Recovery Customer Success Stories[1]
High Availability Disaster Recovery Customer Success Stories[1]
 
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl osIEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
IEEE 2014 JAVA DATA MINING PROJECTS Towards multi tenant performance sl os
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Dynamic Fractional Resource Scheduling -- 2010, ARCS

  • 1. Introduction Off-Line Problem On-Line Problem Summary Appendix Dynamic Fractional Resource Scheduling for Cluster Platforms Mark Stillwell Department of Information and Computer Sciences University of Hawai’i at M¯anoa Achievement Rewards for College Scientists 2010 Scholarship Awards Program Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 2. Introduction Off-Line Problem On-Line Problem Summary Appendix Clusters Definition A cluster is a group of independent computers, or nodes, working together closely, usually connected by a high-speed network. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 3. Introduction Off-Line Problem On-Line Problem Summary Appendix Jobs The system can accept user requests to run jobs or the administrator can instantiate jobs directly Running jobs are made up of nearly identical tasks The number of tasks is specified by user/administrator Tasks can block while communicating with each other The assignment of resources to tasks is called scheduling Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 4. Introduction Off-Line Problem On-Line Problem Summary Appendix Current Approaches Service Hosting (Off-Line scheduling) traditionally, dedicated machines current interest in server consolidation few good theoretical models heavy “engineering” bias High-Performance Computing (On-Line Scheduling) Usually First-Come-First-Served (FCFS) with backfilling Backfilling needs (unreliable) compute time estimates Unbounded wait times Inefficient use of nodes/resources Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 5. Introduction Off-Line Problem On-Line Problem Summary Appendix Our Proposal Use virtual machine technology. Multiple tasks on one node Performance isolation Sharing of fractional resources Define a run-time computable metric that captures notions of performance and fairness. Design heuristics that allocate resources to jobs while explicitly trying to achieve high ratings by our metric. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 6. Introduction Off-Line Problem On-Line Problem Summary Appendix Requirements, Needs, and Yield Tasks have memory requirements and CPU needs All tasks of a job have the same requirements and needs For a task to be placed on a node there must be memory available at least equal to its requirements A task can be allocated less CPU than its need, and the ratio of the allocation to the need is the yield All tasks of a job must have the same yield, so we can also speak of the yield of a job The yield of a job gives its performance relative to if it were run on a dedicated system Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 7. Introduction Off-Line Problem On-Line Problem Summary Appendix Off-Line Problem Assumptions Steady-state execution with infinite jobs Models an ideal service hosting environment Makes the problem more tractable [Marchal et al., 2006] Good when job duration longer than schedule time Jobs are serial (single task) Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 8. Introduction Off-Line Problem On-Line Problem Summary Appendix Objective Function The performance of a job is correlated to its yield Maximizing the average or sum of the yields may be unfair to some jobs Instead, we seek to maximize the minimum yield Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 9. Introduction Off-Line Problem On-Line Problem Summary Appendix Task Placement Heuristics Greedy Task Placement – Incremental, puts each task on the node with the lowest computational load on which it can fit without violating memory constraints Randomized Rounding Task Placement – Based on relaxing an MILP to an LP and rounding probabilistically MCB Task Placement – Global, iteratively applies multi-capacity (vector) bin-packing heuristics during a binary search for the maximized minimum yield Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 10. Introduction Off-Line Problem On-Line Problem Summary Appendix Large Problem Set: Minimum Yield vs. Free Memory 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.35 0.4 0.45 0.5 0.55 Slack MinimumYield bound MCB8 GR GB SG SGB RRND RRNZ Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 11. Introduction Off-Line Problem On-Line Problem Summary Appendix Conclusions The problem is tractable Multi-capacity bin packing algorithms perform well Greedy algorithms perform okay, and are very fast Randomized rounding approaches are not very promising Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 12. Introduction Off-Line Problem On-Line Problem Summary Appendix On-Line Problem Assumptions Job arrival/completion times are not known in advance Jobs are temporary The user wants a final result Quick turnaround relative to runtime is desired Jobs are not interactive So can wait until resources are available to start We avoid the use of runtime estimates Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 13. Introduction Off-Line Problem On-Line Problem Summary Appendix Stretch Our goal: minimize maximum stretch (aka slowdown) Stretch: the time a job spends in the system divided by the time that would be spent in a dedicated system [Bender et al., 1998] Popular to quantify schedule quality post-mortem Not generally used to make scheduling decisions Runtime computation requires (unreliable) user estimates. Minimizing average stretch prone to starvation Minimizing maximum stretch captures notions of both performance and fairness. Our approach: try to maximize minimum yield Similar, but not the same, as minimizing maximum stretch Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 14. Introduction Off-Line Problem On-Line Problem Summary Appendix Max Stretch Degradation vs. Load, no migration cost 1 10 100 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 StretchDegradationFactor Load EASY FCFS GREEDY GREEDYP GREEDYPM DynMCB8 MCB8P 600 MCB8PG 600 Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 15. Introduction Off-Line Problem On-Line Problem Summary Appendix Conclusions DFRS approaches can significantly outperform traditional approaches Aggressive repacking can lead to much better resource allocations But also to heavy migration costs Greedy migration is not that useful Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 16. Introduction Off-Line Problem On-Line Problem Summary Appendix Summary We have proposed a novel approach to job scheduling on clusters, Dynamic Fractional Resource Scheduling, that makes use of modern virtual machine technology and seeks to optimize a runtime-computable, user-centric measure of performance called the minimum yield Multi-capacity bin packing heuristics can be used to find good solutions Our approach avoids the use of unreliable runtime estimates This approach has the potential to lead to order-of-magnitude improvements in performance over current technology Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 17. Introduction Off-Line Problem On-Line Problem Summary Appendix References I Bender, M. A., Chakrabarti, S., and Muthukrishnan, S. (1998). Flow and Stretch Metrics for Scheduling Continuous Job Streams. In Proc. of the 9th ACM-SIAM Symp. On Discrete Algorithms, pages 270–279. Marchal, L., Yang, Y., Casanova, H., and Robert, Y. (2006). Steady-state scheduling of multiple divisible load applications on wide-area distributed computing platforms. Intl. J. of High Performance Computing Applications, 20(3):365–381. Mark Stillwell UH M¯anoa ICS DFRS for Clusters
  • 18. Introduction Off-Line Problem On-Line Problem Summary Appendix References II Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova, H. (2009). Resource Allocation using Virtual Clusters. In CCGrid, pages 260–267. IEEE. Stillwell, M., Vivien, F., and Casanova, H. (2010). Dynamic fractional resource scheduling for HPC workloads. In IPDPS. to appear. Mark Stillwell UH M¯anoa ICS DFRS for Clusters