SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Outline
                                Introduction
     Job Scheduling - with and without SLAs
           Simulating SLAs-based scheduling
                  Conclusions and next steps
                                  Discussion




Simulating the usage of SLAs for job scheduling in
              an HPC environment

                                  Roland K¨bert
                                          u

                     H¨chstleistungsrechenzentrum Stuttgart
                      o


                                January 31, 2010




                             Roland K¨bert
                                     u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                   Introduction
        Job Scheduling - with and without SLAs
              Simulating SLAs-based scheduling
                     Conclusions and next steps
                                     Discussion


Motivation




     HPC services are only offered on best-effort basis
     Scheduling parameters are few and only trivial
     Work about SLAs has been performed at HLRS. . .
     . . . but is on a higher level




                                Roland K¨bert
                                        u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Job scheduling




      scheduling: “to plan (something) at a certain time”
      Scheduling is used in many fields
      Job scheduling assigns computational jobs to processing units




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                     Introduction
          Job Scheduling - with and without SLAs
                Simulating SLAs-based scheduling
                       Conclusions and next steps
                                       Discussion


Service Level Agreements in one sentence




  “The purpose of [a] Service Level Agreement (SLA) is to define
  the services and responsibilities of the [service provider] and its
  clients.” (Michigan State University High Performance Computing
  Center Service Level Agreement)




                                  Roland K¨bert
                                          u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Classical job scheduling




      Objective is mostly to maximize utilization or minimize
      waiting time
      Various algorithms with different advantages
      Either schedule-based or queue-based




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Job scheduling - with SLAs




      A quite popular field
      Two main streams
           SLAs per job
           Trivial QoS parameters (Timing and resource requirements)
      Relies on precise specification of job execution times




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                   Introduction
        Job Scheduling - with and without SLAs
              Simulating SLAs-based scheduling
                     Conclusions and next steps
                                     Discussion


Simulating SLA-based job scheduling




     Just implementing some scheduling won’t work
     Production use cannot be done without previous investigations
     Therefore, use a simulation tool: Alea
     Needs to be extended in order to investigate SLAs




                                Roland K¨bert
                                        u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Alea’s features




      Supports different workload formats
      Various scheduling algorithms already implemented
      Visualization features
      Free software (LGPL)




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                   Introduction
        Job Scheduling - with and without SLAs
              Simulating SLAs-based scheduling
                     Conclusions and next steps
                                     Discussion


Alea’s graphs




                             Figure: Screenshot of Alea
                                Roland K¨bert
                                        u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Alea’s shortcomings




      Many hard-coded settings (magic numbers)
      No extensibility foreseen
      Not really user-friendly
      No further developments




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Alea’s architecture




                   Figure: High-level architecture of Alea 2.1




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Simulation of service levels


      Simulation of three different service levels: gold, silver, bronze
      Different service level distribution were generated and
      simulated against a workload format (San Diego
      Supercomputer Center’s Blue Horizon (144 nodes x 8 CPUs))
      Investigated changes of waiting time with different
      distributions of service levels
      Example: Gold-Silver-Bronze 0-0-100, 0-5-95, 1-4-95, 2-3-95,
      etc.)




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Simulation results



      Machine usage did not change
      Introducing service level increases average wait time
      Increasing number of prioritized jobs increases wait time for
      lower-prioritized classes
      Ensuring that not too many high-priority jobs exist enables
      the service provider to give “soft” guarantees on wait time




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                   Introduction
        Job Scheduling - with and without SLAs
              Simulating SLAs-based scheduling
                     Conclusions and next steps
                                     Discussion


Conclusions



     Using SLAs for scheduling is possible (duh)
     Can range from trivial to complex
     Simulation is a good way to examine different parameters,
     combinations, workloads, objective functions, ...
     Publication has been accepted at PARENG 2011




                                Roland K¨bert
                                        u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion


Next steps




      Improvements on Alea
      Conceptual implementation
      Queue-based against schedule-based algorithms
      Additional, more complex service levels




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                    Introduction
         Job Scheduling - with and without SLAs
               Simulating SLAs-based scheduling
                      Conclusions and next steps
                                      Discussion




1   Introduction

2   Job Scheduling - with and without SLAs

3   Simulating SLAs-based scheduling

4   Conclusions and next steps

5   Discussion




                                 Roland K¨bert
                                         u         Simulating the usage of SLAs for job scheduling in an HPC enviro
Outline
                                   Introduction
        Job Scheduling - with and without SLAs
              Simulating SLAs-based scheduling
                     Conclusions and next steps
                                     Discussion


Questions




                          Figure: Flammarions Holzstich
                                Roland K¨bert
                                        u         Simulating the usage of SLAs for job scheduling in an HPC enviro

Weitere ähnliche Inhalte

Andere mochten auch

GeForce is PC Gaming
GeForce is PC GamingGeForce is PC Gaming
GeForce is PC GamingNVIDIA
 
Exploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCExploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCNVIDIA
 
Adobe Digital Economy Project - January 2017
Adobe Digital Economy Project - January 2017Adobe Digital Economy Project - January 2017
Adobe Digital Economy Project - January 2017Adobe
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...Amazon Appstore Developers
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)Amazon Web Services
 
Adobe Digital Insights: Mobile Landscape A Moving Target
Adobe Digital Insights: Mobile Landscape A Moving TargetAdobe Digital Insights: Mobile Landscape A Moving Target
Adobe Digital Insights: Mobile Landscape A Moving TargetAdobe
 
OpenACC Highlights - February
OpenACC Highlights - FebruaryOpenACC Highlights - February
OpenACC Highlights - FebruaryNVIDIA
 
Best Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsBest Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsAmazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人 使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人 Amazon Web Services
 
Cracking the Facebook Coding Interview
Cracking the Facebook Coding InterviewCracking the Facebook Coding Interview
Cracking the Facebook Coding InterviewGayle McDowell
 

Andere mochten auch (16)

GeForce is PC Gaming
GeForce is PC GamingGeForce is PC Gaming
GeForce is PC Gaming
 
ARM and Machine Learning
ARM and Machine LearningARM and Machine Learning
ARM and Machine Learning
 
Exploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCExploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPC
 
Adobe Digital Economy Project - January 2017
Adobe Digital Economy Project - January 2017Adobe Digital Economy Project - January 2017
Adobe Digital Economy Project - January 2017
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...
Creating Rich Multi-Screen Experiences on Android with Amazon Fling - Mario V...
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
Adobe Digital Insights: Mobile Landscape A Moving Target
Adobe Digital Insights: Mobile Landscape A Moving TargetAdobe Digital Insights: Mobile Landscape A Moving Target
Adobe Digital Insights: Mobile Landscape A Moving Target
 
OpenACC Highlights - February
OpenACC Highlights - FebruaryOpenACC Highlights - February
OpenACC Highlights - February
 
Best Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsBest Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS Workloads
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
VIDEO LUMAscape
VIDEO LUMAscapeVIDEO LUMAscape
VIDEO LUMAscape
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人 使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人
 
Cracking the Facebook Coding Interview
Cracking the Facebook Coding InterviewCracking the Facebook Coding Interview
Cracking the Facebook Coding Interview
 

Ähnlich wie Simulating the usage of SLAs for job scheduling in an HPC environment

SkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid CloudSkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid CloudVlad Kuusk
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computingijccsa
 
7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with pythonHugo Shi
 
Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介gree_tech
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
 
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginnersTobias Koprowski
 
Methods To Leverage SAP Workflow
Methods To Leverage SAP WorkflowMethods To Leverage SAP Workflow
Methods To Leverage SAP WorkflowEric Stajda
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceDatabricks
 
Agile Methodologies in SAP
Agile Methodologies in SAPAgile Methodologies in SAP
Agile Methodologies in SAPGaurav Ahluwalia
 
Friends Don't Let Friends Build Landing Zones
Friends Don't Let Friends Build Landing ZonesFriends Don't Let Friends Build Landing Zones
Friends Don't Let Friends Build Landing ZonesGerald Bachlmayr
 
Walley.tina
Walley.tinaWalley.tina
Walley.tinaNASAPMC
 
System Software Design 14
System Software Design 14System Software Design 14
System Software Design 14diwu
 
System software design 14
System software design 14System software design 14
System software design 14diwu
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaCodeOps Technologies LLP
 
Adapting and adopting spm v04
Adapting and adopting spm v04Adapting and adopting spm v04
Adapting and adopting spm v04Carlos Sierra
 
Calculation Manager: The New and Improved Application to Create Hyperion Plan...
Calculation Manager: The New and Improved Application to Create Hyperion Plan...Calculation Manager: The New and Improved Application to Create Hyperion Plan...
Calculation Manager: The New and Improved Application to Create Hyperion Plan...Alithya
 

Ähnlich wie Simulating the usage of SLAs for job scheduling in an HPC environment (20)

SkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid CloudSkyBase - a Devops Platform for Hybrid Cloud
SkyBase - a Devops Platform for Hybrid Cloud
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computing
 
7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python7 ways to execute scheduled jobs with python
7 ways to execute scheduled jobs with python
 
Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介Cloud Spanner をより便利にする運用支援ツールの紹介
Cloud Spanner をより便利にする運用支援ツールの紹介
 
Asap overview
Asap overviewAsap overview
Asap overview
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#2_Southampton_MaintenancePlansForBeginners
 
Methods To Leverage SAP Workflow
Methods To Leverage SAP WorkflowMethods To Leverage SAP Workflow
Methods To Leverage SAP Workflow
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
 
Agile Methodologies in SAP
Agile Methodologies in SAPAgile Methodologies in SAP
Agile Methodologies in SAP
 
Friends Don't Let Friends Build Landing Zones
Friends Don't Let Friends Build Landing ZonesFriends Don't Let Friends Build Landing Zones
Friends Don't Let Friends Build Landing Zones
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
sqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdfsqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdf
 
What-is-Scheduling-in-Cloud.pptx
What-is-Scheduling-in-Cloud.pptxWhat-is-Scheduling-in-Cloud.pptx
What-is-Scheduling-in-Cloud.pptx
 
Walley.tina
Walley.tinaWalley.tina
Walley.tina
 
System Software Design 14
System Software Design 14System Software Design 14
System Software Design 14
 
System software design 14
System software design 14System software design 14
System software design 14
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Adapting and adopting spm v04
Adapting and adopting spm v04Adapting and adopting spm v04
Adapting and adopting spm v04
 
Calculation Manager: The New and Improved Application to Create Hyperion Plan...
Calculation Manager: The New and Improved Application to Create Hyperion Plan...Calculation Manager: The New and Improved Application to Create Hyperion Plan...
Calculation Manager: The New and Improved Application to Create Hyperion Plan...
 

Simulating the usage of SLAs for job scheduling in an HPC environment

  • 1. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Simulating the usage of SLAs for job scheduling in an HPC environment Roland K¨bert u H¨chstleistungsrechenzentrum Stuttgart o January 31, 2010 Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 2. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 3. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 4. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Motivation HPC services are only offered on best-effort basis Scheduling parameters are few and only trivial Work about SLAs has been performed at HLRS. . . . . . but is on a higher level Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 5. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Job scheduling scheduling: “to plan (something) at a certain time” Scheduling is used in many fields Job scheduling assigns computational jobs to processing units Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 6. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Service Level Agreements in one sentence “The purpose of [a] Service Level Agreement (SLA) is to define the services and responsibilities of the [service provider] and its clients.” (Michigan State University High Performance Computing Center Service Level Agreement) Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 7. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 8. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Classical job scheduling Objective is mostly to maximize utilization or minimize waiting time Various algorithms with different advantages Either schedule-based or queue-based Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 9. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Job scheduling - with SLAs A quite popular field Two main streams SLAs per job Trivial QoS parameters (Timing and resource requirements) Relies on precise specification of job execution times Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 10. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 11. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Simulating SLA-based job scheduling Just implementing some scheduling won’t work Production use cannot be done without previous investigations Therefore, use a simulation tool: Alea Needs to be extended in order to investigate SLAs Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 12. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Alea’s features Supports different workload formats Various scheduling algorithms already implemented Visualization features Free software (LGPL) Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 13. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Alea’s graphs Figure: Screenshot of Alea Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 14. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Alea’s shortcomings Many hard-coded settings (magic numbers) No extensibility foreseen Not really user-friendly No further developments Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 15. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Alea’s architecture Figure: High-level architecture of Alea 2.1 Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 16. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Simulation of service levels Simulation of three different service levels: gold, silver, bronze Different service level distribution were generated and simulated against a workload format (San Diego Supercomputer Center’s Blue Horizon (144 nodes x 8 CPUs)) Investigated changes of waiting time with different distributions of service levels Example: Gold-Silver-Bronze 0-0-100, 0-5-95, 1-4-95, 2-3-95, etc.) Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 17. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Simulation results Machine usage did not change Introducing service level increases average wait time Increasing number of prioritized jobs increases wait time for lower-prioritized classes Ensuring that not too many high-priority jobs exist enables the service provider to give “soft” guarantees on wait time Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 18. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 19. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Conclusions Using SLAs for scheduling is possible (duh) Can range from trivial to complex Simulation is a good way to examine different parameters, combinations, workloads, objective functions, ... Publication has been accepted at PARENG 2011 Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 20. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Next steps Improvements on Alea Conceptual implementation Queue-based against schedule-based algorithms Additional, more complex service levels Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 21. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion 1 Introduction 2 Job Scheduling - with and without SLAs 3 Simulating SLAs-based scheduling 4 Conclusions and next steps 5 Discussion Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro
  • 22. Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Questions Figure: Flammarions Holzstich Roland K¨bert u Simulating the usage of SLAs for job scheduling in an HPC enviro