Calculation Manager: The New and Improved Application to Create Hyperion Plan...
Simulating the usage of SLAs for job scheduling in an HPC environment
1. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Simulating the usage of SLAs for job scheduling in
an HPC environment
Roland K¨bert
u
H¨chstleistungsrechenzentrum Stuttgart
o
January 31, 2010
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
2. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
3. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
4. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Motivation
HPC services are only offered on best-effort basis
Scheduling parameters are few and only trivial
Work about SLAs has been performed at HLRS. . .
. . . but is on a higher level
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
5. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Job scheduling
scheduling: “to plan (something) at a certain time”
Scheduling is used in many fields
Job scheduling assigns computational jobs to processing units
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
6. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Service Level Agreements in one sentence
“The purpose of [a] Service Level Agreement (SLA) is to define
the services and responsibilities of the [service provider] and its
clients.” (Michigan State University High Performance Computing
Center Service Level Agreement)
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
7. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
8. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Classical job scheduling
Objective is mostly to maximize utilization or minimize
waiting time
Various algorithms with different advantages
Either schedule-based or queue-based
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
9. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Job scheduling - with SLAs
A quite popular field
Two main streams
SLAs per job
Trivial QoS parameters (Timing and resource requirements)
Relies on precise specification of job execution times
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
10. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
11. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Simulating SLA-based job scheduling
Just implementing some scheduling won’t work
Production use cannot be done without previous investigations
Therefore, use a simulation tool: Alea
Needs to be extended in order to investigate SLAs
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
12. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Alea’s features
Supports different workload formats
Various scheduling algorithms already implemented
Visualization features
Free software (LGPL)
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
13. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Alea’s graphs
Figure: Screenshot of Alea
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
14. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Alea’s shortcomings
Many hard-coded settings (magic numbers)
No extensibility foreseen
Not really user-friendly
No further developments
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
15. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Alea’s architecture
Figure: High-level architecture of Alea 2.1
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
16. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Simulation of service levels
Simulation of three different service levels: gold, silver, bronze
Different service level distribution were generated and
simulated against a workload format (San Diego
Supercomputer Center’s Blue Horizon (144 nodes x 8 CPUs))
Investigated changes of waiting time with different
distributions of service levels
Example: Gold-Silver-Bronze 0-0-100, 0-5-95, 1-4-95, 2-3-95,
etc.)
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
17. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Simulation results
Machine usage did not change
Introducing service level increases average wait time
Increasing number of prioritized jobs increases wait time for
lower-prioritized classes
Ensuring that not too many high-priority jobs exist enables
the service provider to give “soft” guarantees on wait time
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
18. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
19. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Conclusions
Using SLAs for scheduling is possible (duh)
Can range from trivial to complex
Simulation is a good way to examine different parameters,
combinations, workloads, objective functions, ...
Publication has been accepted at PARENG 2011
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
20. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Next steps
Improvements on Alea
Conceptual implementation
Queue-based against schedule-based algorithms
Additional, more complex service levels
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
21. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
1 Introduction
2 Job Scheduling - with and without SLAs
3 Simulating SLAs-based scheduling
4 Conclusions and next steps
5 Discussion
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro
22. Outline
Introduction
Job Scheduling - with and without SLAs
Simulating SLAs-based scheduling
Conclusions and next steps
Discussion
Questions
Figure: Flammarions Holzstich
Roland K¨bert
u Simulating the usage of SLAs for job scheduling in an HPC enviro