Alexander Fölling, Christian Grimme,
Joachim Lepping, and Alexander Papaspyrou: The Gain of Resource Delegation in Distributed Computing Environments
15th Workshop on Job Scheduling for Parallel Processing ; April 23, 2010 - Atlanta, GA, USA
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
JSSPP 2010
1. The Gain of Resource Delegation in Distributed Computing Environments Alexander Fölling, Christian Grimme, Joachim Lepping, andAlexander Papaspyrou 15th Workshop on Job Scheduling for Parallel Processing April 23, 2010 -Atlanta, GA, USA
2. Outline Motivation System Model Resource Delegation Policy Evaluation Setup Results Conclusion and Future Work 2 Alexander Fölling | April 23, 2010
3. Motivation Distributed computing infrastructures (DCI) have reached production status More and more users draw its computing resources from Grid and Cloud infrastructures Many DCIs are exhaustively used and produce significant revenue Cloud-Infrastructures allow easy on-demand provisioning of resources (enlargement of local resource space) Infrastructure as a Service (IaaS) by virtualization technology Simple access and pricing model The temporal extension of the local resource space allows more flexible scheduling decisions Locally, no traditional parallel job scheduling problem with parallel machines (Pm, Rm, Qm - Model) On-demand resource leasing may improve scheduling performance 3 Alexander Fölling | April 23, 2010
6. System Model 6 Workload- Analysis Workload- Analysis Forward Jobs Forward Jobs 6 Alexander Fölling | April 23, 2010
7. System Model 7 Workload- Analysis Workload- Analysis Forward Jobs Forward Jobs Remote Access Remote Access 7 Alexander Fölling | April 23, 2010
8. Properties of Resource Delegation Different from centralized scheduling with multi-site execution No central scheduling component but independent sites Scheduler cedes full control to other schedulers when resource access is granted (for a certain period) Resource leasing enlarges the local resource space Scheduling decisions are exclusively made by local schedulers Resources might be used immediately or later during leasing period Advanced scheduling strategies may support both local allocations under varying machine sizes planning of future resource requirements Each participant in a DCI is both resource consumer and resource provider 8 Alexander Fölling | April 23, 2010
9. Submission Triggered Resource Delegation Policy (ST-RDP) 9 Site 1 1 2 3 Queue Schedule Site 2 Schedule 9 Alexander Fölling | April 23, 2010
10. Submission Triggered Resource Delegation Policy(ST-RDP) 10 Site 1 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Queue Schedule Site 2 Schedule 10 Alexander Fölling | April 23, 2010
11. Submission Triggered Resource Delegation Policy (ST-RDP) 11 Site 1 2 CPUs idle 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Check ResourceAvailability Queue Schedule Site 2 Schedule 11 Alexander Fölling | April 23, 2010
12. Submission Triggered Resource Delegation Policy (ST-RDP) 12 Site 1 2 CPUs idle 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Check ResourceAvailability Queue Schedule Site 2 [Enough] Schedule 12 Alexander Fölling | April 23, 2010
13. Submission Triggered Resource Delegation Policy (ST-RDP) 13 Site 1 2 CPUs idle 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Check ResourceAvailability [Not Enough] Queue Schedule Try toLend ResourceDeficiency Site 2 [Enough] ? Request 2 CPUs for 100 seconds Schedule 13 Alexander Fölling | April 23, 2010
14. Submission Triggered Resource Delegation Policy (ST-RDP) 14 Site 1 2 CPUs idle 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Check ResourceAvailability [Not Enough] Queue Schedule Try toLend ResourceDeficiency Site 2 [Enough] ? Request 2 CPUs for 100 seconds [Request Denied] Schedule 14 Alexander Fölling | April 23, 2010
15. Submission Triggered Resource Delegation Policy (ST-RDP) 15 Site 1 2 CPUs idle 4 CPUs 100 Seconds Forward Job To LRMS 1 2 3 4 Check ResourceAvailability [Not Enough] Queue Schedule Try toLend ResourceDeficiency Site 2 [Enough] [Request Accepted] Prioritize New Job Request 2 CPUs for 100 seconds [Request Denied] Schedule 15 Alexander Fölling | April 23, 2010
16. Submission Triggered Resource Delegation Policy (ST-RDP) 16 Site 1 Forward Job To LRMS 1 2 3 Check ResourceAvailability 4a [Not Enough] Queue Schedule Try toLend ResourceDeficiency Site 2 [Enough] [Request Accepted] Prioritize New Job 4b [Request Denied] Schedule 16 Alexander Fölling | April 23, 2010
17. Evaluation Setup Input Data Real Workload Traces from Parallel Workloads Archive KTH, CTC, SDSC05 ~ 100 – 1600 CPUs, ~ 28000 – 74000 Jobs (first 11 months) Local Resource Management System EASY Backfilling Evaluation objectives for results Improvements in AWRT Reconfiguration behavior 17 Alexander Fölling | April 23, 2010
19. Results: Reconfiguration behavior KTH and CTC 11 month with ST-RDP CPUs CPUs 400 500 300 400 200 300 100 200 2 4 6 8 10 2 4 6 8 10 Time in month Time in month KTH CTC 19 Alexander Fölling | April 23, 2010
20. Conclusion 20 Proposed new concept for resource delegation in DCIs Parallel job scheduling problems under varying machine sizes The resource requirements can be flexibly negotiated among participants Evaluation of a simpleresourcedelegationmethod Without need for further information exchange Robust in changing environments Results show significant benefits for the local scheduling (improvement in AWRT) During operation, many resources are delegated among sites Alexander Fölling | April 23, 2010
21. Future Work Application to larger DCI environments Considering additional location policies that decides which site to ask first for delegation Long term planning of resource leasing/delegation Not only single job decisions Decisions should be based on workload records (user behavior, submission patterns etc.) Eventually, make decision on predicted user behavior Consider additional parameters like local queue/schedule status 21 Alexander Fölling | April 23, 2010
22. Thank You 22 Alexander Fölling alexander.foelling@udo.edu Christian Grimme christian.grimme@udo.edu Joachim Lepping joachim.lepping@udo.edu Robotics Research Institute Information Technology Section TU Dortmund University, Germany http://www.it.irf.de Alexander Papaspyrou alexander.papaspyrou@udo.edu