SlideShare ist ein Scribd-Unternehmen logo
1 von 33
W
PBS and Scheduling at NCI
The Past, Present and Future
Andrew Wellington
Senior HPC Systems Administrator
September 2015
nci.org.au
Overview
• What is NCI
• The Past
• Our Machines
• PBS
• The Present
• The Future
• Questions
nci.org.au
NCI - An Overview
• NCI is Australia’s national high-performance computing service
• comprehensive, vertically-integrated research service
• providing national access on priority and merit
• driven by research objectives
• Operates as a formal collaboration of ANU, CSIRO, the Australian Bureau of Meteorology and Geoscience
Australia
• As a partnership with a number of research-intensive Universities, supported by the Australian Research
Council.
nci.org.au
Where are we located?
• Nation’s capital — Canberra, ACT
• At its National University — The Australian National University (ANU)
nci.org.au
Research Communities
Research focus areas
• Climate Science and Earth System Science
• Astronomy (optical and theoretical)
• Geosciences: Geophysics, Earth Observation
• Biosciences & Bioinformatics
• Computational Sciences
• Engineering
• Chemistry
• Physics
• Social Sciences
• Growing emphasis on data-intensive computation
• Cloud Services
• Earth System Grid
nci.org.au
Who Uses NCI?
• 3,000+ users
• 10 new users every week
• 600 + projects
Astrophysics, Biology, Climate & Weather, Oceanography,
particle Physics, fluid dynamics, materials science, Chemistry,
Photonics, Mathematics, image processing, Geophysics,
Engineering, remote sensing, Bioinformatics, Environmental
Science, Geospatial, Hydrology, data mining
nci.org.au
What do they use it for?
nci.org.au
Past Machines - SC
• 126 nodes with 4x 1GHz Alpha CPUs (504 CPUs)
• Between 4 - 16GB RAM per node
• Total 700GB RAM, 2.88TB global disk, 13.1TB total disk
• Theoretical peak over 1 Tflop
• Linpack at 820 Gflops
• Quadrics “Elan3” interconnect
• 250 Mbytes/sec bidirectional
nci.org.au
Past Machines - LC
• 152 nodes with 2.66GHz Pentium 4
• 1GB RAM per node
• Theoretical peak over 800 Gflops
• 1.4TB global storage, 16TB total disk
• Gigabit ethernet interconnect
nci.org.au
Past Machines - AC
• 1928 1.6GHz Itanium2 processors
• Grouped into 30 partitions with 64 processors each
• Total 5.6TB RAM, 30TB global disk, 47TB total disk
• Theoretical peak over 11 Tflops
• SGI NUMAlink4 interconnect
nci.org.au
Past Machines - XE
• 156 nodes with 2x quad-core 3.0GHz Xeon Harpertown (1248 cores)
• Between 16 - 32GB RAM per node
• Total 2.7TB RAM, 54TB global disk, 130TB total disk
• 32 NVIDIA Tesla Fermi GPUs (16 Tflops)
• Theoretical peak almost 15 Tflops
• DDR Infiniband interconnect
nci.org.au
Past Machines - Vayu
• 1492 nodes with 2x quad-core 2.93GHz Xeon Nehalem (11,936 cores)
• Between 24 - 96GB RAM per node
• Total 37TB RAM, 800TB global disk
• Theoretical peak approx 140 Tflops
• QDR Infiniband interconnect
nci.org.au
Current Machine - Raijin
• 3592 nodes with 2x 8-core 2.6GHz Xeon Sandy Bridge (57,472 cores)
• Between 32- 128 GB RAM per node
• Total 160TB RAM, 10 PB global disk
• Theoretical peak approx 1.2 PFlops
• FDR Infiniband interconnect
• Around 52km (32 miles) of IB cabling
• 1.5 MW power; 100 tonnes of water in cooling
• Access to global Lustre filesystems (21 PB+)
nci.org.au
Current Machine - Tenjin Cloud
• Dell C8000 based high performance cloud
• 100 nodes with 2x 8-core 2.6GHz Xeon Sandy Bridge (1600 cores)
• 128GB RAM per node
• Over 12TB main memory, 650TB Ceph global storage
• Access to global Lustre filesystems (21 PB+)
• OpenStack management
nci.org.au
The Past - ANUPBS
• A heavily customised fork of OpenPBS v2.3
• Adds a lot of new commands
• jobnodes, jobs_on_node, nqstat, pbs_rusage, pbsrsh, pestat, qcat, qcp, qls, qps, qwait
• Modified a number of commands to add new options
• pbsdsh, pbsnodes, qalter, qdel, qorder, qrerun, qrun, qsub
• Unique scheduling algorithm
• Tight integration with local accounting and allocation system
• License shadow daemon integration for tracking usage of licensed software
• Support for local “jobfs” filesystem
• Support for cpusets (part of cgroups now)
nci.org.au
The Past - More Features
• The concept of draining the system or individual nodes
• Basically dedicated time options but for individual nodes
• Configuring nodes for maximum walltimes for jobs
• Eg, this node only runs jobs with a walltime less than 2hours
nci.org.au
The Past - Accounting
• ANUPBS tightly integrated with “RASH” (Resource Accounting Shell)
• Allowed ANUPBS to make scheduling decisions based on accounting data
• RASH integration with systems allowed users access to be linked to accounting
• Tight integration meant it was difficult to port RASH forward to PBS Pro
nci.org.au
The Past - Suspend/Resume
• Scheduler only thinks in terms of suspend/resume
• Every job has the possibility to suspend other jobs, not just “express” jobs
• Advantages of suspend/resume:
• Large parallel jobs not requiring reserved nodes
• Debug or express jobs not requiring reserved nodes
• Long running jobs not preventing other jobs from running
• Disadvantages
• Possible excessive paging if not managed correctly
• Too many suspended jobs with too few queued jobs may leave free nodes
nci.org.au
The Past - Suspend/Resume
• Operation of the suspend/resume scheduler
• In general jobs in the same queue have the same priority
• A pairwise comparison (preemptor/preemptee) of all job pairs
• Consider many factors:
• Relative walltime, ncpus, already suspended time, existing resource usages by
user and project, how close jobs are to completion, etc
nci.org.au
The Present - PBS Pro
• Using PBS Pro 12.0-based custom branch
• Using backfill based scheduling
• Customisation of our PBS Pro installation
• Allocation / accounting system “alloc_db”
• Priority calculation scripts
• Running CPU, memory, walltime limits
• Support for MUNGE authentication
nci.org.au
The Present - Why PBS Pro?
• During acceptance testing of Raijin…
• Lead developer of ANUPBS left NCI
• PBS Pro offered a supported scheduler and resource manager
• Same heritage — more familiar to users
• Altair work with us to customise PBS Pro to our needs
• Good time to change — moving to a new machine
nci.org.au
The Present - Suspend / Resume
• Looked at suspend/resume but found issues
• More CPUs being suspended than required to run a job
• Less flexibility in selection of jobs to suspend
• At one point the scheduler was trying to suspend jobs even when there were
CPUs free to run the new job!
• Jobs can end up suspended “forever” as their node keeps getting selected
• Only works well with suspending for high-priority “express” jobs
• Can only specify target jobs for preemption as a binary option
nci.org.au
The Present - Suspend / Resume
• Small number of jobs as preemption targets means that the jobs are “picked on”
• Sample jobs run through our test cluster (based on a day of jobs in the real clsuter)
Job Walltime Used Time Suspend Time % vs Used % vs Request
A 2:24:00 20:47 3:21:20 968.72% 139.81%
B 04:48:00 1:57:20 7:05:56 363.01% 147.89%
C 2:00:00 1:15:56 1:06:47 87.95% 55.65%
D 1:00:00 13:17 49:59 376.29% 83.31%
nci.org.au
The Present - Accounting
• NCI allocates compute hours to projects across quarters
• System parses accounting logs provided by PBS
• Accounting logs don’t include resource usage for jobs that get deleted due to
MOMs going down
• Further information extracted from job history in cronjob (qstat -fx)
• Once a project is out of quota for the quarter, jobs may be allowed in “bonus”
• Somewhat manual in some areas of the system, especially reporting
nci.org.au
The Present - License Daemon
• Our own custom License Shadow Daemon (“lsd”)
• One lsd for multiple clusters
• Tracks usage of licenses across many commercial packages
• Has knowledge of pattern of license usage for different resource requests
• Users should request software in their jobs script (-l software=fluent)
• Jobs that request licenses have them “reserved” by lsd
• Can detect “rogue” jobs using licenses with no PBS request
• Hooks integrate with lsd to reject running jobs when licenses aren’t free
nci.org.au
The Present - Local “jobfs”
• We allow users to request temporary storage that is node local
• Handled with 3 hooks:
• Run job hook creating temporary folder on local disk
• Periodic hook checking usage of temporary folder
• End job hook deleting temporary folder when complete
• Some issues currently
• Nothing automating cleanup if it fails (script manually taking care of it)
• Periodic hook is not completely accurate
nci.org.au
The Present - Other Features
• Hooks implementing:
• Node health check before running job
• Job resource summary at end of job output file
• Enable/disable hyperthreading if job requests it
• Old features not implemented in our current system:
• Process containment with cpusets / cgroups
• Full suspend/resume scheduling
nci.org.au
The Future - Lessons from Present
• Slowness in scheduling cycles contributed to by:
• Time taken for some of the hooks we use (optimisation required)
• Ordering of hooks
• Calculation of start time for top jobs (sometimes)
• At times the server blocks waiting for MOM responses
nci.org.au
The Future - PBS Pro 13
• Currently early stages of testing PBS Pro 13.0
• New PBS features we’re most interested in:
• Support for cgroups
• Node health hooks
• Hook configuration files
• Asynchronous scheduling
nci.org.au
The Future - Our Customisations
• Enhancing suspend/resume support in PBS Pro
• Investigating use of anti-express queue to allow all bonus jobs to be preempted
by non-bonus jobs
• Better ways of targeting suspension
• Allocation management upgrades
• Speed enhancements
• Further automation
nci.org.au
The Future - Cloud
• Integration with our cloud environments for job submission
• Requires MUNGE to support multiple realms to not leak our key
• Opportunistic scheduling of jobs from Raijin HPC to Cloud
• What jobs are suitable to be moved?
• What cloud environments can we use to do this?
• Local OpenStack (Tenjin)
• Australian Federated NeCTAR Cloud (OpenStack)
• Public cloud (Amazon, Azure, etc)
nci.org.au
Questions?
W
Providing Australian researchers
with world-class computing services
NCI Contacts
General enquiries: +61 2 6125 9800
Media enquiries: +61 2 6125 4389
Help desk: help@nci.org.au
Address:
NCI, Building 143, Ward Road
The Australian National University
Canberra ACT 0200

Weitere ähnliche Inhalte

Was ist angesagt?

How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
Redis Labs
 

Was ist angesagt? (20)

ONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINAONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINA
 
Openstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingOpenstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud Networking
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Apache Storm In Retail Context
Apache Storm In Retail ContextApache Storm In Retail Context
Apache Storm In Retail Context
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
Scalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and PracticeScalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and Practice
 
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloudRow #9: An architecture overview of APNIC's RDAP deployment to the cloud
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
 
Supporting Over a Thousand Custom Hive User Defined Functions
Supporting Over a Thousand Custom Hive User Defined FunctionsSupporting Over a Thousand Custom Hive User Defined Functions
Supporting Over a Thousand Custom Hive User Defined Functions
 

Andere mochten auch

Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Ian Lumb
 
Altair - compute manager your gateway to hpc cloud computing with pbs profess...
Altair - compute manager your gateway to hpc cloud computing with pbs profess...Altair - compute manager your gateway to hpc cloud computing with pbs profess...
Altair - compute manager your gateway to hpc cloud computing with pbs profess...
Volodymyr Saviak
 
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
Altair
 
AltaiHTC 2012 Connector Training
AltaiHTC 2012 Connector TrainingAltaiHTC 2012 Connector Training
AltaiHTC 2012 Connector Training
Altair
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
Altair
 
Altair NVH Solutions - Americas ATC 2015 Workshop
Altair NVH Solutions - Americas ATC 2015 WorkshopAltair NVH Solutions - Americas ATC 2015 Workshop
Altair NVH Solutions - Americas ATC 2015 Workshop
Altair
 

Andere mochten auch (20)

Altair Pbs Works Overview 10 1 Kiew
Altair Pbs Works Overview 10 1 KiewAltair Pbs Works Overview 10 1 Kiew
Altair Pbs Works Overview 10 1 Kiew
 
Managing Clusters with Intel® Xeon Phi™ Coprocessors using Bright Cluster Man...
Managing Clusters with Intel® Xeon Phi™ Coprocessors using Bright Cluster Man...Managing Clusters with Intel® Xeon Phi™ Coprocessors using Bright Cluster Man...
Managing Clusters with Intel® Xeon Phi™ Coprocessors using Bright Cluster Man...
 
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster Manager
 
HPC Technology Compass 2014/15
HPC Technology Compass 2014/15HPC Technology Compass 2014/15
HPC Technology Compass 2014/15
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Altair - compute manager your gateway to hpc cloud computing with pbs profess...
Altair - compute manager your gateway to hpc cloud computing with pbs profess...Altair - compute manager your gateway to hpc cloud computing with pbs profess...
Altair - compute manager your gateway to hpc cloud computing with pbs profess...
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
Altair HPC & Cloud Offerings: A key enabler for Multi-Physics - Americas ATC ...
 
HPC Cluster & Cloud Computing
HPC Cluster & Cloud ComputingHPC Cluster & Cloud Computing
HPC Cluster & Cloud Computing
 
Building an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 Minutes
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
AltaiHTC 2012 Connector Training
AltaiHTC 2012 Connector TrainingAltaiHTC 2012 Connector Training
AltaiHTC 2012 Connector Training
 
Altair on Intel Xeon Phi: Optimizing HPC for Breakthrough Performance
Altair on Intel Xeon Phi:  Optimizing HPC for Breakthrough PerformanceAltair on Intel Xeon Phi:  Optimizing HPC for Breakthrough Performance
Altair on Intel Xeon Phi: Optimizing HPC for Breakthrough Performance
 
CFD Analysis with AcuSolve and ultraFluidX
CFD Analysis with AcuSolve and ultraFluidXCFD Analysis with AcuSolve and ultraFluidX
CFD Analysis with AcuSolve and ultraFluidX
 
Altair HTC 2012 NVH Training
Altair HTC 2012 NVH TrainingAltair HTC 2012 NVH Training
Altair HTC 2012 NVH Training
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
 
Altair NVH Solutions - Americas ATC 2015 Workshop
Altair NVH Solutions - Americas ATC 2015 WorkshopAltair NVH Solutions - Americas ATC 2015 Workshop
Altair NVH Solutions - Americas ATC 2015 Workshop
 

Ähnlich wie PBS and Scheduling at NCI: The past, present and future

Ähnlich wie PBS and Scheduling at NCI: The past, present and future (20)

Cloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems TriviaCloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems Trivia
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Insider operating system
Insider   operating systemInsider   operating system
Insider operating system
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
GPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and ContainerGPU cloud with Job scheduler and Container
GPU cloud with Job scheduler and Container
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slides
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharingVinetalk: The missing piece for cluster managers to enable accelerator sharing
Vinetalk: The missing piece for cluster managers to enable accelerator sharing
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

PBS and Scheduling at NCI: The past, present and future

  • 1. W PBS and Scheduling at NCI The Past, Present and Future Andrew Wellington Senior HPC Systems Administrator September 2015
  • 2. nci.org.au Overview • What is NCI • The Past • Our Machines • PBS • The Present • The Future • Questions
  • 3. nci.org.au NCI - An Overview • NCI is Australia’s national high-performance computing service • comprehensive, vertically-integrated research service • providing national access on priority and merit • driven by research objectives • Operates as a formal collaboration of ANU, CSIRO, the Australian Bureau of Meteorology and Geoscience Australia • As a partnership with a number of research-intensive Universities, supported by the Australian Research Council.
  • 4. nci.org.au Where are we located? • Nation’s capital — Canberra, ACT • At its National University — The Australian National University (ANU)
  • 5. nci.org.au Research Communities Research focus areas • Climate Science and Earth System Science • Astronomy (optical and theoretical) • Geosciences: Geophysics, Earth Observation • Biosciences & Bioinformatics • Computational Sciences • Engineering • Chemistry • Physics • Social Sciences • Growing emphasis on data-intensive computation • Cloud Services • Earth System Grid
  • 6. nci.org.au Who Uses NCI? • 3,000+ users • 10 new users every week • 600 + projects Astrophysics, Biology, Climate & Weather, Oceanography, particle Physics, fluid dynamics, materials science, Chemistry, Photonics, Mathematics, image processing, Geophysics, Engineering, remote sensing, Bioinformatics, Environmental Science, Geospatial, Hydrology, data mining
  • 8. nci.org.au Past Machines - SC • 126 nodes with 4x 1GHz Alpha CPUs (504 CPUs) • Between 4 - 16GB RAM per node • Total 700GB RAM, 2.88TB global disk, 13.1TB total disk • Theoretical peak over 1 Tflop • Linpack at 820 Gflops • Quadrics “Elan3” interconnect • 250 Mbytes/sec bidirectional
  • 9. nci.org.au Past Machines - LC • 152 nodes with 2.66GHz Pentium 4 • 1GB RAM per node • Theoretical peak over 800 Gflops • 1.4TB global storage, 16TB total disk • Gigabit ethernet interconnect
  • 10. nci.org.au Past Machines - AC • 1928 1.6GHz Itanium2 processors • Grouped into 30 partitions with 64 processors each • Total 5.6TB RAM, 30TB global disk, 47TB total disk • Theoretical peak over 11 Tflops • SGI NUMAlink4 interconnect
  • 11. nci.org.au Past Machines - XE • 156 nodes with 2x quad-core 3.0GHz Xeon Harpertown (1248 cores) • Between 16 - 32GB RAM per node • Total 2.7TB RAM, 54TB global disk, 130TB total disk • 32 NVIDIA Tesla Fermi GPUs (16 Tflops) • Theoretical peak almost 15 Tflops • DDR Infiniband interconnect
  • 12. nci.org.au Past Machines - Vayu • 1492 nodes with 2x quad-core 2.93GHz Xeon Nehalem (11,936 cores) • Between 24 - 96GB RAM per node • Total 37TB RAM, 800TB global disk • Theoretical peak approx 140 Tflops • QDR Infiniband interconnect
  • 13. nci.org.au Current Machine - Raijin • 3592 nodes with 2x 8-core 2.6GHz Xeon Sandy Bridge (57,472 cores) • Between 32- 128 GB RAM per node • Total 160TB RAM, 10 PB global disk • Theoretical peak approx 1.2 PFlops • FDR Infiniband interconnect • Around 52km (32 miles) of IB cabling • 1.5 MW power; 100 tonnes of water in cooling • Access to global Lustre filesystems (21 PB+)
  • 14. nci.org.au Current Machine - Tenjin Cloud • Dell C8000 based high performance cloud • 100 nodes with 2x 8-core 2.6GHz Xeon Sandy Bridge (1600 cores) • 128GB RAM per node • Over 12TB main memory, 650TB Ceph global storage • Access to global Lustre filesystems (21 PB+) • OpenStack management
  • 15. nci.org.au The Past - ANUPBS • A heavily customised fork of OpenPBS v2.3 • Adds a lot of new commands • jobnodes, jobs_on_node, nqstat, pbs_rusage, pbsrsh, pestat, qcat, qcp, qls, qps, qwait • Modified a number of commands to add new options • pbsdsh, pbsnodes, qalter, qdel, qorder, qrerun, qrun, qsub • Unique scheduling algorithm • Tight integration with local accounting and allocation system • License shadow daemon integration for tracking usage of licensed software • Support for local “jobfs” filesystem • Support for cpusets (part of cgroups now)
  • 16. nci.org.au The Past - More Features • The concept of draining the system or individual nodes • Basically dedicated time options but for individual nodes • Configuring nodes for maximum walltimes for jobs • Eg, this node only runs jobs with a walltime less than 2hours
  • 17. nci.org.au The Past - Accounting • ANUPBS tightly integrated with “RASH” (Resource Accounting Shell) • Allowed ANUPBS to make scheduling decisions based on accounting data • RASH integration with systems allowed users access to be linked to accounting • Tight integration meant it was difficult to port RASH forward to PBS Pro
  • 18. nci.org.au The Past - Suspend/Resume • Scheduler only thinks in terms of suspend/resume • Every job has the possibility to suspend other jobs, not just “express” jobs • Advantages of suspend/resume: • Large parallel jobs not requiring reserved nodes • Debug or express jobs not requiring reserved nodes • Long running jobs not preventing other jobs from running • Disadvantages • Possible excessive paging if not managed correctly • Too many suspended jobs with too few queued jobs may leave free nodes
  • 19. nci.org.au The Past - Suspend/Resume • Operation of the suspend/resume scheduler • In general jobs in the same queue have the same priority • A pairwise comparison (preemptor/preemptee) of all job pairs • Consider many factors: • Relative walltime, ncpus, already suspended time, existing resource usages by user and project, how close jobs are to completion, etc
  • 20. nci.org.au The Present - PBS Pro • Using PBS Pro 12.0-based custom branch • Using backfill based scheduling • Customisation of our PBS Pro installation • Allocation / accounting system “alloc_db” • Priority calculation scripts • Running CPU, memory, walltime limits • Support for MUNGE authentication
  • 21. nci.org.au The Present - Why PBS Pro? • During acceptance testing of Raijin… • Lead developer of ANUPBS left NCI • PBS Pro offered a supported scheduler and resource manager • Same heritage — more familiar to users • Altair work with us to customise PBS Pro to our needs • Good time to change — moving to a new machine
  • 22. nci.org.au The Present - Suspend / Resume • Looked at suspend/resume but found issues • More CPUs being suspended than required to run a job • Less flexibility in selection of jobs to suspend • At one point the scheduler was trying to suspend jobs even when there were CPUs free to run the new job! • Jobs can end up suspended “forever” as their node keeps getting selected • Only works well with suspending for high-priority “express” jobs • Can only specify target jobs for preemption as a binary option
  • 23. nci.org.au The Present - Suspend / Resume • Small number of jobs as preemption targets means that the jobs are “picked on” • Sample jobs run through our test cluster (based on a day of jobs in the real clsuter) Job Walltime Used Time Suspend Time % vs Used % vs Request A 2:24:00 20:47 3:21:20 968.72% 139.81% B 04:48:00 1:57:20 7:05:56 363.01% 147.89% C 2:00:00 1:15:56 1:06:47 87.95% 55.65% D 1:00:00 13:17 49:59 376.29% 83.31%
  • 24. nci.org.au The Present - Accounting • NCI allocates compute hours to projects across quarters • System parses accounting logs provided by PBS • Accounting logs don’t include resource usage for jobs that get deleted due to MOMs going down • Further information extracted from job history in cronjob (qstat -fx) • Once a project is out of quota for the quarter, jobs may be allowed in “bonus” • Somewhat manual in some areas of the system, especially reporting
  • 25. nci.org.au The Present - License Daemon • Our own custom License Shadow Daemon (“lsd”) • One lsd for multiple clusters • Tracks usage of licenses across many commercial packages • Has knowledge of pattern of license usage for different resource requests • Users should request software in their jobs script (-l software=fluent) • Jobs that request licenses have them “reserved” by lsd • Can detect “rogue” jobs using licenses with no PBS request • Hooks integrate with lsd to reject running jobs when licenses aren’t free
  • 26. nci.org.au The Present - Local “jobfs” • We allow users to request temporary storage that is node local • Handled with 3 hooks: • Run job hook creating temporary folder on local disk • Periodic hook checking usage of temporary folder • End job hook deleting temporary folder when complete • Some issues currently • Nothing automating cleanup if it fails (script manually taking care of it) • Periodic hook is not completely accurate
  • 27. nci.org.au The Present - Other Features • Hooks implementing: • Node health check before running job • Job resource summary at end of job output file • Enable/disable hyperthreading if job requests it • Old features not implemented in our current system: • Process containment with cpusets / cgroups • Full suspend/resume scheduling
  • 28. nci.org.au The Future - Lessons from Present • Slowness in scheduling cycles contributed to by: • Time taken for some of the hooks we use (optimisation required) • Ordering of hooks • Calculation of start time for top jobs (sometimes) • At times the server blocks waiting for MOM responses
  • 29. nci.org.au The Future - PBS Pro 13 • Currently early stages of testing PBS Pro 13.0 • New PBS features we’re most interested in: • Support for cgroups • Node health hooks • Hook configuration files • Asynchronous scheduling
  • 30. nci.org.au The Future - Our Customisations • Enhancing suspend/resume support in PBS Pro • Investigating use of anti-express queue to allow all bonus jobs to be preempted by non-bonus jobs • Better ways of targeting suspension • Allocation management upgrades • Speed enhancements • Further automation
  • 31. nci.org.au The Future - Cloud • Integration with our cloud environments for job submission • Requires MUNGE to support multiple realms to not leak our key • Opportunistic scheduling of jobs from Raijin HPC to Cloud • What jobs are suitable to be moved? • What cloud environments can we use to do this? • Local OpenStack (Tenjin) • Australian Federated NeCTAR Cloud (OpenStack) • Public cloud (Amazon, Azure, etc)
  • 33. W Providing Australian researchers with world-class computing services NCI Contacts General enquiries: +61 2 6125 9800 Media enquiries: +61 2 6125 4389 Help desk: help@nci.org.au Address: NCI, Building 143, Ward Road The Australian National University Canberra ACT 0200

Hinweis der Redaktion

  1. Vayu is the hindu god of wind
  2. Raijin is the Japanese god of thunder and lightning in the Shinto religion
  3. Draining 2h used when bringing nodes back from failures to ensure that if the fault returns less compute is impacted
  4. Do not think in terms of backfill at all. Backfill has nothing to do with this algorithm.
  5. Express only is quite different to how we worked with ANUPBS Suspend forever may be fixable with starving job parameters, but it’s very hard to define a real time that is a starving job for us. Targeting jobs for preemption is done with the preempt_targets resource, and a binary “suspendable” flag. Doesn’t allow us to do something like “preempt jobs with priority < 100”. Also doesn’t allow us to have “normal” jobs suspend other “normal” jobs
  6. The jobs used here essentially are all 10% of the walltimes of real jobs All these jobs were submitted in one go at the start, their priorities didn’t change significantly during the time. Earlier jobs with more suspended time had more competition, as the higher priority stuff got run, in production jobs come in more over time.
  7. Accounting logs not containing everything is annoying for us as we charge for time used even if the system causes their job to fail. Not as tightly integrated as the ANUPBS / RASH schedulers
  8. We have to run a client daemon on the PBS servers to maintain a persistent connection to LSD, this is to continue to do what the old ANUPBS did. Hooks talk to local daemon.
  9. Periodic hook doesn’t pickup the unlink-fill disk pattern (only checks files still present in directory)
  10. Tentative deployment date will be early 2016
  11. Anti-express queues would require potentially setting softlimits for projects (maybe automatic once over quota in allocation system?)
  12. We have a MUNGE Patch in test that provides multiple realms NeCTAR is an Australian government funded cloud environment operated by a number of institutions (NCI hosts a node)