SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Scheduling with Torque-Maui – A Tutorial
Contents The problem being addressed Torque – how it helps Maui – how it helps Job Submission – job priorities, job dependencies, job queues Job Monitoring Job Accounting Install
The problem Have jobs/tasks run as soon as possible Have higher priority jobs run earlier than others Run jobs on any free machine across a cluster automatically not just on one machine Have jobs run un-attended and inform in case of error Machine utilization has to be high Monitor and account for all the usage
Torque – how it helps What is TORQUE’s job as the resource manager. Accepting and starting jobs/tasks across a batch farm (qsub command) Cancelling jobs (qdel command) Monitoring the state of jobs (qstatcommand) Collecting return codes (qstat) Accounting of jobs, the time they took, memory used, etc (tracejob command)
Maui – how it helps What is MAUI’s Job? MAUI makes all the decisions. Should a job be started asking questions like: Is there enough resource to start the job? Given all the jobs I could start which one should I start? MAUI runs a scheduling iteration: When a job is submitted. When a job ends. At regular configurable intervals.
Job Submission Jobs are submitted to the batch system by means of the qsub command, as in qsub job.sh But you can also add resource description directly on the command line: qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh qsub Returns a <jobid>
Job priority Can give priority with qsub qsub –p 20 job.sh Default priority is 0 U can give priorities from 0 to 1023 for a job
Job dependencies Run a job after another job successfully ends echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.
Job Queues Batch systems are usually configured with multiple queues. Each queue can be configured to accept job from a certain group of users, or within specified resource limits Queue selection is performed with -q queuename on the qsubcommand line Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)
Job Monitoring For a job id, u can see the command that was fired for the job in the file /var/spool/torque/server_priv/jobs/<JOBID.SC> sudo cat 90.localhost.localdomain.SC /home/gbprod/testscript_aruba/aruba_parallel_loader  qa0 1306219430 aruba_test_pod /glassbeam/core/bin qstat – status of all submitted jobs  Status of only one job - qstat <jobid> Only running jobs - qstat –r Email alert for jobs - qsub -m ae -M santoshglassbeam.com  (Send email in case of a – abort, e – end of job)
Job accounting … Can give job return status, how much time and  show what happened today to job id Tracejob <jobid> tracejob -n d <jobid> (search last d days for the job),  fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2  -f job -f job_usage 114.localhost
Job accounting Tracejob output Job: 114.localhost.localdomain 05/30/2011 05:25:15  A    queue=batch 05/30/2011 05:25:15  A    user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515                           start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 05/30/2011 05:25:25  A    user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515                           start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1                           session=26992 end=1306747525 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:10
Install Torque install As root user Go to folder install/torque-gb-3.0.1 Run command: ./torque.setupgbprodlocalhost Maui install As root user  Go to folder install/maui-gb-3.3.1 Run command shinstall.sh

Weitere ähnliche Inhalte

Was ist angesagt?

All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopSaša Tatar
 
Quartz connector
Quartz connectorQuartz connector
Quartz connectorRahul Kumar
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Prasanna Gautam
 
NSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingNSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingMichael Medin
 
On the way to low latency
On the way to low latencyOn the way to low latency
On the way to low latencyArtem Orobets
 
101 apend. scripting, crond, atd
101 apend. scripting, crond, atd101 apend. scripting, crond, atd
101 apend. scripting, crond, atdAcácio Oliveira
 
Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Chris Lohfink
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry managerToshiaki Baba
 
Linux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingLinux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingKenny (netman)
 
Gearinfive
GearinfiveGearinfive
Gearinfivebpmedley
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?The Software House
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmJooho Lee
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupBadoo Development
 
Simple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleSimple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleKeith Resar
 
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...VCP Muthukrishna
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertorToshiaki Baba
 

Was ist angesagt? (20)

All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
 
Puppet Data Mining
Puppet Data MiningPuppet Data Mining
Puppet Data Mining
 
Quartz connector
Quartz connectorQuartz connector
Quartz connector
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
Puppet and Openshift
Puppet and OpenshiftPuppet and Openshift
Puppet and Openshift
 
Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21
 
NSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingNSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 Scripting
 
On the way to low latency
On the way to low latencyOn the way to low latency
On the way to low latency
 
101 apend. scripting, crond, atd
101 apend. scripting, crond, atd101 apend. scripting, crond, atd
101 apend. scripting, crond, atd
 
Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry manager
 
Linux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingLinux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job Scheduling
 
Gearinfive
GearinfiveGearinfive
Gearinfive
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvm
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
 
Simple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleSimple Tips and Tricks with Ansible
Simple Tips and Tricks with Ansible
 
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertor
 

Ähnlich wie Scheduling torque-maui-tutorial

Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work managerlpu
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work managerbhatnagar.gaurav83
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Controlahmad bassiouny
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmtJulien Pivotto
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To BatchLuca Mearelli
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freqLinaro
 
Node.js flow control
Node.js flow controlNode.js flow control
Node.js flow controlSimon Su
 
Process scheduling
Process schedulingProcess scheduling
Process schedulingHao-Ran Liu
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow softwareAnubhav Jain
 

Ähnlich wie Scheduling torque-maui-tutorial (20)

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Introduction to SLURM
 Introduction to SLURM Introduction to SLURM
Introduction to SLURM
 
Introduction to Slurm
Introduction to SlurmIntroduction to Slurm
Introduction to Slurm
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work manager
 
Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Salesforce asynchronous apex
Salesforce asynchronous apexSalesforce asynchronous apex
Salesforce asynchronous apex
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work manager
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Control
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmt
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To Batch
 
Celery
CeleryCelery
Celery
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Node.js flow control
Node.js flow controlNode.js flow control
Node.js flow control
 
CA 7-final-ppt
CA 7-final-pptCA 7-final-ppt
CA 7-final-ppt
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
 
Queue your work
Queue your workQueue your work
Queue your work
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow software
 

Kürzlich hochgeladen

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Scheduling torque-maui-tutorial

  • 2. Contents The problem being addressed Torque – how it helps Maui – how it helps Job Submission – job priorities, job dependencies, job queues Job Monitoring Job Accounting Install
  • 3. The problem Have jobs/tasks run as soon as possible Have higher priority jobs run earlier than others Run jobs on any free machine across a cluster automatically not just on one machine Have jobs run un-attended and inform in case of error Machine utilization has to be high Monitor and account for all the usage
  • 4. Torque – how it helps What is TORQUE’s job as the resource manager. Accepting and starting jobs/tasks across a batch farm (qsub command) Cancelling jobs (qdel command) Monitoring the state of jobs (qstatcommand) Collecting return codes (qstat) Accounting of jobs, the time they took, memory used, etc (tracejob command)
  • 5. Maui – how it helps What is MAUI’s Job? MAUI makes all the decisions. Should a job be started asking questions like: Is there enough resource to start the job? Given all the jobs I could start which one should I start? MAUI runs a scheduling iteration: When a job is submitted. When a job ends. At regular configurable intervals.
  • 6. Job Submission Jobs are submitted to the batch system by means of the qsub command, as in qsub job.sh But you can also add resource description directly on the command line: qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh qsub Returns a <jobid>
  • 7. Job priority Can give priority with qsub qsub –p 20 job.sh Default priority is 0 U can give priorities from 0 to 1023 for a job
  • 8. Job dependencies Run a job after another job successfully ends echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.
  • 9. Job Queues Batch systems are usually configured with multiple queues. Each queue can be configured to accept job from a certain group of users, or within specified resource limits Queue selection is performed with -q queuename on the qsubcommand line Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)
  • 10. Job Monitoring For a job id, u can see the command that was fired for the job in the file /var/spool/torque/server_priv/jobs/<JOBID.SC> sudo cat 90.localhost.localdomain.SC /home/gbprod/testscript_aruba/aruba_parallel_loader qa0 1306219430 aruba_test_pod /glassbeam/core/bin qstat – status of all submitted jobs Status of only one job - qstat <jobid> Only running jobs - qstat –r Email alert for jobs - qsub -m ae -M santoshglassbeam.com (Send email in case of a – abort, e – end of job)
  • 11. Job accounting … Can give job return status, how much time and show what happened today to job id Tracejob <jobid> tracejob -n d <jobid> (search last d days for the job), fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2 -f job -f job_usage 114.localhost
  • 12. Job accounting Tracejob output Job: 114.localhost.localdomain 05/30/2011 05:25:15 A queue=batch 05/30/2011 05:25:15 A user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515 start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 05/30/2011 05:25:25 A user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515 start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=26992 end=1306747525 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:10
  • 13. Install Torque install As root user Go to folder install/torque-gb-3.0.1 Run command: ./torque.setupgbprodlocalhost Maui install As root user Go to folder install/maui-gb-3.3.1 Run command shinstall.sh