SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Introduction to SLURM
Ismael Fernández Pavón
Cristian Gomollon Escribano
19 / 02 / 2020
What is SLURM?
What is SLURM?
• Allocates access to resources for some duration of time.
• Provides a framework for starting, executing, and
monitoring work (normally a parallel job).
• Arbitrates contention for resources by managing
a queue of pending work.
Cluster manager and job scheduler
system for large and small Linux
clusters.
LoadLeveler (IBM)
LSF
SLURM
PBS Pro
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
Torque
Maui
Moab
✓ Open source
✓ Fault-tolerant
✓ Highly scalable
LoadLeveler (IBM)
LSF
SLURM
PBS Pro
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
Torque
Maui
Moab
Resource Manager
SLURM: Resource Management
Cluster:
Collection of many separate
servers (nodes), connected
via a fast interconnect.
Node
CPU
(Core)
CPU
(Thread)
SLURM: Resource Management
Nodes:
• Baseboards, Sockets,
Cores, Threads, (CPUs)
• Memory size
• Generic resources
(GRES)
• Features
• State
GPGPU
(GRES)
Individual computer
component of an HPC
system.
SLURM: Resource Management
Partitions:
• Associatedwith specific
set of nodes
• Nodes can be in more
than one partition
• Job size and time limits
• Access control list
• State information
Partitions
Logical group of nodes with
common specs.
Allocated
cores
SLURM: Resource Management
Allocated
memory
Jobs:
• ID (a number)
• Name
• Time limit
• Size specification
• Other Jobs Dependency
• State
Allocations of resources
assigned to a user for a
specified amount of time.
Core
used
SLURM: Resource Management
Memory
used
Jobs Step:
• ID (a number)
• Name
• Time limit
• Size specification
Sets of (possibly parallel)
tasks within a job.
SLURM: Resource Management
FULL CLUSTER
Job scheduling time!
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
FIFO Scheduling
Resources
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
FIFO Scheduling
Backfill Scheduling
• Job priority
• Time limit (Important!)
Time
Resources
SLURM: Job Scheduling
Backfill Scheduling:
• Based on the job request, resources available, and
policy limits imposed.
• Starts with job priority.
• Higher priority jobs cannot be delayed by lower priority
jobs.
• Expected start time of pending jobs depends upon the
expected completion time of running jobs, reasonably
accurate time limits.
• Results in a resource allocation over a period.
Backfill Scheduling:
• Ej: New lower priority job
SLURM: Job Scheduling
Elapsed time
Time limit
Time
Resources
Backfill Scheduling:
• Ej: New lower priority job
Time
Resources
SLURM: Job Scheduling
Submit
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
SLURM: Job Scheduling
Time
Resources
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
SLURM: Job Scheduling
Time
Resources
Wait time: 7
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
Time
Resources
SLURM: Job Scheduling
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
Time
Resources
SLURM: Job Scheduling
Submit
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
SLURM: Job Scheduling
Time
Resources
Elapsed time
Time limit
Backfill Scheduling:
• Ej: New lower priority job
SLURM: Job Scheduling
Time
Resources
Wait time: 1
Elapsed time
Time limit
SLURM: Job Scheduling
Backfill Scheduling:
• Starts with job priority.
Job_priority =
= site_factor +
+ (PriorityWeightQOS) * (QOS_factor) +
+ (PriorityWeightPartition) * (partition_factor) +
+ (PriorityWeightFairshare) * (fair-share_factor) +
+ (PriorityWeightAge) * (age_factor) +
+ (PriorityWeightJobSize) * (job_size_factor) +
+ (PriorityWeightAssoc) * (assoc_factor) +
+ SUM(TRES_weight_<type> * TRES_factor_<type>…)
− nice_factor
SLURM: Job Scheduling
Backfill Scheduling:
• Starts with job priority.
Job_priority =
= site_factor +
+ (PriorityWeightQOS) * (QOS_factor) +
+ (PriorityWeightPartition) * (partition_factor) +
+ (PriorityWeightFairshare) * (fair-share_factor) +
+ (PriorityWeightAge) * (age_factor) +
+ (PriorityWeightJobSize) * (job_size_factor) +
+ (PriorityWeightAssoc) * (assoc_factor) +
+ SUM(TRES_weight_<type> * TRES_factor_<type>…)
− nice_factor
Fixed value
Dynamic value
User defined value
Backfill Scheduling:
• Priority factor:
SLURM: Job Scheduling
QoS:
• Account’s Priority:
− Normal
− Low
QoS
Backfill Scheduling:
• Priority factor:
SLURM: Job Scheduling
Partition:
• It only affects to RES
users:
− class_a
− class_b
− class_c
QoS
Partition
Backfill Scheduling:
• Priority factor:
SLURM: Job Scheduling
Fairshare:
• It depends on:
• Consumption.
• Resources requested.
QoS
Partition
Fairshare
Backfill Scheduling:
• Priority factor:
SLURM: Job Scheduling
Age:
• Increase priority as more
time the job pends on
queue.
• Max 7 days.
• Not valid for dependent
jobs!
QoS
Partition
Fairshare
Age
Backfill Scheduling:
• Priority factor:
SLURM: Job Scheduling
Job size:
• Bigger jobs have more
priority.
• ONLY resources
NOT time.
QoS
Partition
Fairshare
Age
Job size
Commands
•sbatch – Submit a batch script.
•salloc – Request resources for an interactive job.
•srun – Start a new task (job step).
•scancel – Cancel a job.
SLURM: Commands
• sinfo – Report system status (nodes, queues, etc.).
PARTITION AVAIL TIME NODES STATE NODELIST
std* up inf+ 2 mix pirineus[15,21]
std* up inf+ 30 alloc pirineus[13-14,16-20,22-44]
std-fat up inf+ 3 idle~ pirineus[45,49-50]
std-fat up inf+ 3 alloc pirineus[46-48]
gpu up inf+ 2 idle~ pirineusgpu[3-4]
gpu up inf+ 1 mix pirineusgpu2
knl up inf+ 3 idle~ pirineusknl[2-4]
mem up inf+ 1 mix canigo1
class_a up inf+ 1 idle~ pirineus12
class_a up inf+ 2 mix canigo1,pirineus11
class_a up inf+ 8 alloc pirineus[1-6,8-9]
class_a up inf+ 2 resv pirineus[7,10]
class_c up inf+ 1 idle~ pirineus12
class_c up inf+ 2 mix canigo1,pirineus11
class_c up inf+ 8 alloc pirineus[1-6,8-9]
class_c up inf+ 2 resv pirineus[7,10]
SLURM: Commands
• sinfo – Report system status.
-N Node-oriented format information, with one line per
node and partition.
-p Print information only about the specified partition(s).
--Format Specify the information to be displayed.
"Nodelist,Partition,StateCompact,CpusState,Memory,Freemem"
NODELIST PARTITION STATE CPUS(A/I/O/T) MEMORY FREE_MEM
canigo1 class_a mix 112/80/0/192 4643070 2458001
pirineus1 class_a idle~ 0/48/0/48 191904 188950
pirineus2 class_a alloc 48/0/0/48 191904 44123
pirineus3 class_a alloc 48/0/0/48 191904 41831
pirineus4 class_a mix 32/16/0/48 191904 66623
pirineus5 class_a mix 16/32/0/48 191904 162277
pirineus6 class_a alloc 48/0/0/48 191904 82747
pirineus7 class_a idle~ 0/48/0/48 191904 189289
SLURM: Commands
• sinfo – Report system status.
-s List only a partition state summary with no node state details.
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
std* up infinite 32/0/0/32 pirineus[13-44]
std-fat up infinite 3/3/0/6 pirineus[45-50]
gpu up infinite 1/2/0/3 pirineusgpu[2-4]
knl up infinite 0/3/0/3 pirineusknl[2-4]
mem up infinite 1/0/0/1 canigo1
class_a up infinite 10/3/0/13 canigo1,pirineus[1-12]
class_b up infinite 10/3/0/13 canigo1,pirineus[1-12]
class_c up infinite 10/3/0/13 canigo1,pirineus[1-12]
SLURM: Commands
• sinfo – Report system status.
-s List only a partition state summary with no node state details.
TIP: Use system-status.
SLURM: Commands
+-----------+-------------+-----------------+--------------+------------+
| MACHINE | TOTAL SLOTS | ALLOCATED SLOTS | QUEUED SLOTS | OCCUPATION |
+-----------+-------------+-----------------+--------------+------------+
| std nodes | 1536 | 1468 | 2212 | 95 % |
| fat nodes | 288 | 144 | 0 | 50 % |
| mem nodes | 96 | 96 | 289 | 100 % |
| gpu nodes | 144 | 96 | 252 | 66 % |
| knl nodes | 816 | 0 | 0 | 0 % |
| res nodes | 672 | 648 | 1200 | 96 % |
+-----------+-------------+-----------------+--------------+------------+
• squeue – Report job and job step status.
JOBID PARTIT NAME USER ST TIME NODES NODELIST
1222376 mem dada2 mvelasco PD 0:00 1 (Resources)
1221504 std Freq_TS_ uabqut16 PD 0:00 1 (Resources)
1222346 std Cu2T-tra agusti PD 0:00 1 (Priority)
1222347 std AuIPr_Ph sciortin PD 0:00 1 (Priority)
1220930 std nickeloc ubaqis07 PD 0:00 1 (Priority)
1222351 std g09d1 upceqt04 R 2:18:20 1 pirineus21
1221621 mem C3 vpenya R 23:56:04 1 canigo1
1221569 std preTS_VI porellan R 19:39:13 1 pirineus17
1221543 std Au2-Cl-d agusti R 1-13:40:32 1 pirineus22
1221616 std-fat CuII_mod mariona R 1-10:35:33 1 pirineus47
1221617 std-fat CuIII_mo mariona R 1-10:35:33 1 pirineus48
1221461 std opt-1xe2 pbesalu R 2-11:22:43 1 pirineus37
1221413 std s24ls_de jcirera R 4:08:01 1 pirineus22
1220720 std nickeloc ubaqis07 R 4-03:00:44 2 pirineus[34-35]
1220719 std nickeloc ubaqis07 R 4-03:00:48 1 pirineus14
1221546 mem C60-Zn-T pbesalu R 22:31:12 1 canigo1
SLURM: Commands
• scontrol – Administrator tool to view and/or update
system, job, step, partition or reservation status.
scontrol hold <jobid>
scontrol release <jobid>
scontrol show job <jobid>
SLURM: Commands
SLURM: Commands
JobId=1222543 JobName=test_large_g16.slm
UserId=ifernandez(80347) GroupId=csuc(10000) MCS_label=N/A
Priority=100209 Nice=0 Account=csuc QOS=test
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=05:04:05 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2020-01-16T09:55:19 EligibleTime=2020-01-16T09:55:19
AccrueTime=2020-01-16T09:55:19
StartTime=2020-01-16T09:55:20 EndTime=2020-01-17T09:55:21 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-01-16T09:55:20
Partition=std AllocNode:Sid=192.168.19.26:7243
ReqNodeList=(null) ExcNodeList=(null)
NodeList=pirineus17
BatchHost=pirineus17
NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:*
TRES=cpu=4,mem=15600M,node=1,billing=4
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=4 MinMemoryCPU=3900M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/ifernandez/examples/gaussian/g16/large/test_large_g16.slm
WorkDir=/home/ifernandez/examples/gaussian/g16/large
StdErr=/home/ifernandez/examples/gaussian/g16/large/slurm-1222543.out
StdIn=/dev/null
StdOut=/home/ifernandez/examples/gaussian/g16/large/slurm-1222543.out
Power=
Job Life
SLURM: Job Life
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
HOLD
RELEASE
REQUEUE
SUBMISSION
SLURM: Job Life
SUBMISSION
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED
HOLD
RELEASE
REQUEUE
TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
SLURM: Job Life
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED
HOLD
RELEASE
REQUEUE
SUBMISSION
TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
SLURM: Job Life
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED
HOLD
RELEASE
REQUEUE
SUBMISSION
TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
Pending Reasons:
• Priority: One or more higher priority jobs exist for this partition or advanced
reservation.
• Reasources: The job is waiting for resources to become available.
• Reservation: The job is waiting its advanced reservation to become available.
• ReqNodeNotAvail: Some node specifically required by the job is not currently
available.
• JobHeldAdmin / JobHeldUser: The job is held by a system administrator / the
user.
• Dependency: This job is waiting for a dependent job to complete.
• BadConstraints: The job's constraints can not be satisfied.
• InvalidQOS: The job's QOS is invalid. Account’s assigned time exhausted?
• AssociationTimeLimit: The job's association has reached its time limit.
Account’s assigned time exhausted?
SLURM: Job Life
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED
HOLD
RELEASE
REQUEUE
SUBMISSION
TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
SLURM: Job Life
PENDING
(CONFIGURING)
RUNNING
HELD RESIZE
CANCELED
COMPLETING
COMPLETED
HOLD
RELEASE
REQUEUE
SUBMISSION
TIMEOUTFAIL
OUT OF
MEMORY
SPECIAL
EXIT
NODE FAIL
•SLURM Upgrade to 19.05
• New job state: OUT_OF_MEMORY.
• Job killed by OOM.
• Fixed ratio between MEMORY and CPU.
SLURM: News
Partition
MAX. Mem per CPU
(MB)
MAX. Mem per CPU
(GB)
std 3900 MB 3,8 GB
std-fat 7900 MB 7,7 GB
mem 24180 MB 23,6 GB
Questions?
Enjoy SLURM!
How to launch jobs?
Login on CSUC infrastructure
• Login
ssh –p 2122 username@hpc.csuc.cat
• Transferfiles
scp -P 2122 local_file username@hpc.csuc.cat:[path to your folder]
sftp -oPort=2122 username@hpc.csuc.cat
• Useful paths
Name Variable Availability Quote/project Time limit Backup
/home/$user $HOME global >64 GB unlimited Yes
/scratch/$user $SCRATCH global unlimited 30 days No
/scratch/$user/tmp/jobid $TMPDIR / $SHAREDSCRATCH global job file limit 1 week No
/tmp/$user/jobid $TMPDIR / $LOCALSCRATCH Local to each node job file limit 1 week No
• Get HC consumption
consum -a ‘any’ (group consumption)
consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
Batch job submission: Default settings
• 4-8Gb/core (std and std-fat partitions).
• 24Gb/core on mem partition.
• 1 core on std, std-fat and mem partitions.
• 24 cores and 1 GPU on gpu partition.
• The whole node on KNL partition.
• Non-exclusive, multinode job.
• Working and Output directory are the submit directory.
Batch job submission
• Basic Linux commands:
Description Command Exemple
List files ls ls /home/user
Making folders mkdir mkdir /home/prova
Changing folder cd cd /home/prova
Copy files cp cp nom_arxiu1 nom_arxiu2
Move file mv mv /home/prova.txt /cescascratch/prova.txt
Delete file rm rm filename
Print file content cat cat filename
Find string into files grep grep ‘word’ filename
List last lines on file tail tail filename
• Text editors : vim, nano, emacs,etc.
• More detailed info and options about the commands:
‘command’ –help
man ‘command’
#!/bin/bash
#SBATCH–jJOB_NAME
#SBATCH-o OUTPUT_FILE.log
#SBATCH-e ERROR_FILE.err
#SBATCH-p PARTITION
#SBATCH–mem=TOTMEM
#SBATCH-n NTASKS
#SBATCH–c NCORES/TASK
module load mpi/intel/openmpi/3.1.0
cp –r $input $SCRATCH
Cd $SCRATCH
srun $APPLICATION
mkdir -p $OUTPUT_DIR
cp -r * $output
Batch job submission: The slurm submit script
Schedulerdirectives
Setting up the environment variables and paths
Move the input files to the working directory
Launch the application(similar to mpirun)
Create the output folderand move the outputs
Scheduler directives/Options : #SBATCH
• -c, --cpus-per-task=ncpus number of cpus required per task
• --gres=list required generic resources
• -J, --job-name=jobname name of job
• -n, --ntasks=ntasks number of tasks to run
• --ntasks-per-node=n number of tasks to invoke on each node
• -N, --nodes=N number of nodes on which to run (N = min[-max])
• -o, --output=out file for batch script's standard output
• -p, --partition=partition partition requested
• -t, --time=minutes time limit (format: dd-hh:mm)
• -C, --constraint=list specify a list of constraints(mem, vnc , ....)
• --mem=MB minimum amount of total real memory
• --reservation=name allocate resources from named reservation
• -w, --nodelist=hosts... request a specific list of hosts
• --mem-per-cpu=MB amount of real memory per allocated core
• -t, --time=minutes Job max duration (Mandatory!!)
More commands/infotyping 'sbatch -h'
Scheduler directives/Options : #SBATCH
How to generate slurm script files: 1º Identify app parallelism
Threadparallelism
Process parallelism
#SBATCH –-ntasks=1
#SBATCH --cpus-per-task=NCORES
#SBATCH –-ntasks=NCORES
#SBATCH --cpus-per-task=1
How to generate slurm script files: 2º Determine the memory requirements
#SBATCH –-mem=63900
#SBATCH --cpus-per-task=8
#SBATCH --partition=std-fat
The partition choice is strongly dependent of the job memory requirements !!
#SBATCH –-mem=63900
#SBATCH --cpus-per-task=16
#SBATCH --partition=std
#SBATCH –-mem=63900
#SBATCH --cpus-per-task=4
#SBATCH --partition=mem
#SBATCH –-mem-per-cpu=3900
#SBATCH --ntasks=16
#SBATCH --partition=std
Partition Memory/core
std/gpu
std-fat/KNL
mem
4Gb
8Gb
24Gb
How to generate slurm script files: 3º RunTime requirements
#SBATCH --time=Thpc
WORKSTATION -->
4 Cores(Nws)
8-16Gb RAM
1Tb 600mb/s
Ethernet 1-10 Gbs
HPC NODE
48 Cores(Nhpc)
192Gb RAM
200Tb 4Gb/s
Infiniband 100-200Gbs
Performance comparison At first approximation:
How to generate slurm script files: 4º Disk/IO requirements
Two kind of applications
Threaded/serial Multitask
Only one node: Multinode:
cd $SHAREDSCRATCH
Or
cd $LOCALSCRATCH
cd $SHAREDSCRATCH
Or let SLURM decide for you
cd $SCRATCH
How to generate slurm script files: Summary
1. Identify your application parallelism.
2. Estimate the amount of resources needed by your solving algorithm.
3. Estimate as better as possible the runtime.
4. Determine if your job I/O and input requirements.
5. Determine which are the necessary output files and save only these files
in your own disk space.
Gaussian 16 (Threaded Example)
#!/bin/bash
#SBATCH-j gau16_test
#SBATCH-o gau_test_%j.log
#SBATCH-e gau_test_%j.err
#SBATCH-n 1
#SBATCH-c 16
#SBATCH-p std
#SBATCH–mem=30000
#SBATCH–time=10-00
module load gaussian/g16b1
INPUT_DIR=/$HOME/gaussian_test/inputs
OUTPUT_DIR=$HOME/gaussian_test/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
g16 < input.gau > output.out
mkdir -p $OUTPUT_DIR
cp -r output.out $output
Threaded application
Less than 4Gb/core, std partition
10 Days RunTime
Set up environment to run the APP
Vasp 5.4.4 (Multitask Example)
#!/bin/bash
#SBATCH-j vasp_test_%j
#SBATCH-o vasp_test_%j.log
#SBATCH–e vasp_test_%j.err
#SBATCH-n 24
#SBATCH–c 1
#SBATCH–mem-per-cpu=7500
#SBATCH-p std-fat
#SBATCH–time=20:00
module load vasp/5.4.4
INPUT_DIR=/$HOME/vasp_test/inputs
OUTPUT_DIR=$HOME/vasp_test/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
srun `which vasp_std`
mkdir -p $OUTPUT_DIR
cp -r * $output
Multitaskapplication
More than 4Gb/core,but less than 8Gb/core ,
std-fat partition
20 Min RunTime
Set up environment to run the APP
Multitask app requires 'srun' command
Gromacs (MultiTask and threaded Example)
#!/bin/bash
#SBATCH--job-name=gromacs
#SBATCH--output=gromacs_%j.out
#SBATCH--error=gromacs_%j.err
#SBATCH-n 24
#SBATCH-c 2
#SBATCH-N 1
#SBATCH-p gpu
#SBATCH--gres=gpu:2
#SBATCH--time=00:30:00
module load gromacs/2018.4_mpi
cd $SHAREDSCRATCH
cp -r $HOME/SLMs/gromacs/CASE/*.
srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb
gpu -npme 12 -dlb yes -pin on –gpu_id 01
cp –r * /scratch/$USER/gromacs/CASE/output/
1 NODE Hybrid job!
2GPUs/Node on GPU partition
ANSYS Fluent (MultiTask Example)
#!/bin/bash
#SBATCH-j truck.cas
#SBATCH-o truck.log
#SBATCH-e truck.err
#SBATCH-p std
#SBATCH-n 16
#SBATCH–time=10-20:00
module load toolchains/gcc_mkl_ompi
INPUT_DIR=$HOME/FLUENT/inputs
OUTPUT_DIR=$HOME/FLUENT/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
/prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt
mkdir -p $OUTPUT_DIR
cp -r * $output
Best Practices
• Use $SCRATCHas workingdirectory.
• Move only the necessaryfiles(notall files in the folder each time).
• Try to keep importantfiles only at $HOME
• Try to choose the partition and resoruces whose mostfit to your job.
Thank you for your attention!

Weitere ähnliche Inhalte

Was ist angesagt?

L2 over L3 ecnaspsulations
L2 over L3 ecnaspsulationsL2 over L3 ecnaspsulations
L2 over L3 ecnaspsulationsMotonori Shindo
 
[D12] NonStop SQLって何? by Susumu Yamamoto
[D12] NonStop SQLって何? by Susumu Yamamoto[D12] NonStop SQLって何? by Susumu Yamamoto
[D12] NonStop SQLって何? by Susumu YamamotoInsight Technology, Inc.
 
Best Practices for Getting Started with NGINX Open Source
Best Practices for Getting Started with NGINX Open SourceBest Practices for Getting Started with NGINX Open Source
Best Practices for Getting Started with NGINX Open SourceNGINX, Inc.
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Akihiro Suda
 
イマドキのExcelスクショの撮り方
イマドキのExcelスクショの撮り方イマドキのExcelスクショの撮り方
イマドキのExcelスクショの撮り方Yoshitaka Kawashima
 
OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)Takeshi HASEGAWA
 
コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話Yuta Shimada
 
Ansible ネットワーク自動化チュートリアル (JANOG42)
Ansible ネットワーク自動化チュートリアル (JANOG42)Ansible ネットワーク自動化チュートリアル (JANOG42)
Ansible ネットワーク自動化チュートリアル (JANOG42)akira6592
 
不揮発メモリ(NVDIMM)とLinuxの対応動向について
不揮発メモリ(NVDIMM)とLinuxの対応動向について不揮発メモリ(NVDIMM)とLinuxの対応動向について
不揮発メモリ(NVDIMM)とLinuxの対応動向についてYasunori Goto
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)Kentaro Ebisawa
 
Karpenterで君だけの最強のオートスケーリングを実装しよう
Karpenterで君だけの最強のオートスケーリングを実装しようKarpenterで君だけの最強のオートスケーリングを実装しよう
Karpenterで君だけの最強のオートスケーリングを実装しようKohei Nagase
 
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)NTT DATA Technology & Innovation
 
Onieで遊んでみようとした話
Onieで遊んでみようとした話Onieで遊んでみようとした話
Onieで遊んでみようとした話Masaru Oki
 
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...NTT DATA Technology & Innovation
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdfYasunori Goto
 
KVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークKVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークVirtualTech Japan Inc.
 
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜Jumpei Miyata
 

Was ist angesagt? (20)

L2 over L3 ecnaspsulations
L2 over L3 ecnaspsulationsL2 over L3 ecnaspsulations
L2 over L3 ecnaspsulations
 
[D12] NonStop SQLって何? by Susumu Yamamoto
[D12] NonStop SQLって何? by Susumu Yamamoto[D12] NonStop SQLって何? by Susumu Yamamoto
[D12] NonStop SQLって何? by Susumu Yamamoto
 
Best Practices for Getting Started with NGINX Open Source
Best Practices for Getting Started with NGINX Open SourceBest Practices for Getting Started with NGINX Open Source
Best Practices for Getting Started with NGINX Open Source
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
イマドキのExcelスクショの撮り方
イマドキのExcelスクショの撮り方イマドキのExcelスクショの撮り方
イマドキのExcelスクショの撮り方
 
OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)
 
コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話コンテナにおけるパフォーマンス調査でハマった話
コンテナにおけるパフォーマンス調査でハマった話
 
Ansible ネットワーク自動化チュートリアル (JANOG42)
Ansible ネットワーク自動化チュートリアル (JANOG42)Ansible ネットワーク自動化チュートリアル (JANOG42)
Ansible ネットワーク自動化チュートリアル (JANOG42)
 
不揮発メモリ(NVDIMM)とLinuxの対応動向について
不揮発メモリ(NVDIMM)とLinuxの対応動向について不揮発メモリ(NVDIMM)とLinuxの対応動向について
不揮発メモリ(NVDIMM)とLinuxの対応動向について
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
Karpenterで君だけの最強のオートスケーリングを実装しよう
Karpenterで君だけの最強のオートスケーリングを実装しようKarpenterで君だけの最強のオートスケーリングを実装しよう
Karpenterで君だけの最強のオートスケーリングを実装しよう
 
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
オススメのJavaログ管理手法 ~コンテナ編~(Open Source Conference 2022 Online/Spring 発表資料)
 
Onieで遊んでみようとした話
Onieで遊んでみようとした話Onieで遊んでみようとした話
Onieで遊んでみようとした話
 
LXDのすすめ
LXDのすすめLXDのすすめ
LXDのすすめ
 
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
 
CXL_説明_公開用.pdf
CXL_説明_公開用.pdfCXL_説明_公開用.pdf
CXL_説明_公開用.pdf
 
CPUから見たG1GC
CPUから見たG1GCCPUから見たG1GC
CPUから見たG1GC
 
KVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークKVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマーク
 
マスタリングTCP/IP ニフクラ編
マスタリングTCP/IP ニフクラ編マスタリングTCP/IP ニフクラ編
マスタリングTCP/IP ニフクラ編
 
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜
サイボウズの CI/CD 事情 〜Jenkins おじさんは CircleCI おじさんにしんかした!〜
 

Ähnlich wie Introduction to SLURM

Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaaCuneyt Goksu
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerAndrejs Vorobjovs
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basicsnitin anjankar
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerMaris Elsins
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuningMongoDB
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachLaurent Leturgez
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 

Ähnlich wie Introduction to SLURM (20)

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Introduction to SLURM
 Introduction to SLURM Introduction to SLURM
Introduction to SLURM
 
Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
 
Introduction to Slurm
Introduction to SlurmIntroduction to Slurm
Introduction to Slurm
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approach
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 

Mehr von CSUC - Consorci de Serveis Universitaris de Catalunya

Mehr von CSUC - Consorci de Serveis Universitaris de Catalunya (20)

Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
 
Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)
 
Publicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de RecercaPublicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de Recerca
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
 
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
 
Security Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademySecurity Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademy
 
The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
 
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
 
La gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolasLa gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolas
 
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
 
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
 
Enginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fonsEnginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fons
 
Transformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IATransformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IA
 
Difusió del coneixement a l'Il·lustre Col·legi de l'Advocacia de Barcelona
Difusió del coneixement a l'Il·lustre Col·legi de l'Advocacia de BarcelonaDifusió del coneixement a l'Il·lustre Col·legi de l'Advocacia de Barcelona
Difusió del coneixement a l'Il·lustre Col·legi de l'Advocacia de Barcelona
 
Fons de discos perforats de cartró
Fons de discos perforats de cartróFons de discos perforats de cartró
Fons de discos perforats de cartró
 
Biblioteca Digital Gencat
Biblioteca Digital GencatBiblioteca Digital Gencat
Biblioteca Digital Gencat
 
El fons Enrique Tierno Galván: recepció, tractament i difusió
El fons Enrique Tierno Galván: recepció, tractament i difusióEl fons Enrique Tierno Galván: recepció, tractament i difusió
El fons Enrique Tierno Galván: recepció, tractament i difusió
 
El CIDMA: més enllà dels espais físics
El CIDMA: més enllà dels espais físicsEl CIDMA: més enllà dels espais físics
El CIDMA: més enllà dels espais físics
 
Els serveis del CSUC per a la comunitat CCUC
Els serveis del CSUC per a la comunitat CCUCEls serveis del CSUC per a la comunitat CCUC
Els serveis del CSUC per a la comunitat CCUC
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Introduction to SLURM

  • 1. Introduction to SLURM Ismael Fernández Pavón Cristian Gomollon Escribano 19 / 02 / 2020
  • 3. What is SLURM? • Allocates access to resources for some duration of time. • Provides a framework for starting, executing, and monitoring work (normally a parallel job). • Arbitrates contention for resources by managing a queue of pending work. Cluster manager and job scheduler system for large and small Linux clusters.
  • 4. LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 5. ✓ Open source ✓ Fault-tolerant ✓ Highly scalable LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 7. SLURM: Resource Management Cluster: Collection of many separate servers (nodes), connected via a fast interconnect.
  • 8. Node CPU (Core) CPU (Thread) SLURM: Resource Management Nodes: • Baseboards, Sockets, Cores, Threads, (CPUs) • Memory size • Generic resources (GRES) • Features • State GPGPU (GRES) Individual computer component of an HPC system.
  • 9. SLURM: Resource Management Partitions: • Associatedwith specific set of nodes • Nodes can be in more than one partition • Job size and time limits • Access control list • State information Partitions Logical group of nodes with common specs.
  • 10. Allocated cores SLURM: Resource Management Allocated memory Jobs: • ID (a number) • Name • Time limit • Size specification • Other Jobs Dependency • State Allocations of resources assigned to a user for a specified amount of time.
  • 11. Core used SLURM: Resource Management Memory used Jobs Step: • ID (a number) • Name • Time limit • Size specification Sets of (possibly parallel) tasks within a job.
  • 12. SLURM: Resource Management FULL CLUSTER Job scheduling time!
  • 13. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources.
  • 14. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. FIFO Scheduling Resources
  • 15. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. FIFO Scheduling Backfill Scheduling • Job priority • Time limit (Important!) Time Resources
  • 16. SLURM: Job Scheduling Backfill Scheduling: • Based on the job request, resources available, and policy limits imposed. • Starts with job priority. • Higher priority jobs cannot be delayed by lower priority jobs. • Expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits. • Results in a resource allocation over a period.
  • 17. Backfill Scheduling: • Ej: New lower priority job SLURM: Job Scheduling Elapsed time Time limit Time Resources
  • 18. Backfill Scheduling: • Ej: New lower priority job Time Resources SLURM: Job Scheduling Submit Elapsed time Time limit
  • 19. Backfill Scheduling: • Ej: New lower priority job SLURM: Job Scheduling Time Resources Elapsed time Time limit
  • 20. Backfill Scheduling: • Ej: New lower priority job SLURM: Job Scheduling Time Resources Wait time: 7 Elapsed time Time limit
  • 21. Backfill Scheduling: • Ej: New lower priority job Time Resources SLURM: Job Scheduling Elapsed time Time limit
  • 22. Backfill Scheduling: • Ej: New lower priority job Time Resources SLURM: Job Scheduling Submit Elapsed time Time limit
  • 23. Backfill Scheduling: • Ej: New lower priority job SLURM: Job Scheduling Time Resources Elapsed time Time limit
  • 24. Backfill Scheduling: • Ej: New lower priority job SLURM: Job Scheduling Time Resources Wait time: 1 Elapsed time Time limit
  • 25. SLURM: Job Scheduling Backfill Scheduling: • Starts with job priority. Job_priority = = site_factor + + (PriorityWeightQOS) * (QOS_factor) + + (PriorityWeightPartition) * (partition_factor) + + (PriorityWeightFairshare) * (fair-share_factor) + + (PriorityWeightAge) * (age_factor) + + (PriorityWeightJobSize) * (job_size_factor) + + (PriorityWeightAssoc) * (assoc_factor) + + SUM(TRES_weight_<type> * TRES_factor_<type>…) − nice_factor
  • 26. SLURM: Job Scheduling Backfill Scheduling: • Starts with job priority. Job_priority = = site_factor + + (PriorityWeightQOS) * (QOS_factor) + + (PriorityWeightPartition) * (partition_factor) + + (PriorityWeightFairshare) * (fair-share_factor) + + (PriorityWeightAge) * (age_factor) + + (PriorityWeightJobSize) * (job_size_factor) + + (PriorityWeightAssoc) * (assoc_factor) + + SUM(TRES_weight_<type> * TRES_factor_<type>…) − nice_factor Fixed value Dynamic value User defined value
  • 27. Backfill Scheduling: • Priority factor: SLURM: Job Scheduling QoS: • Account’s Priority: − Normal − Low QoS
  • 28. Backfill Scheduling: • Priority factor: SLURM: Job Scheduling Partition: • It only affects to RES users: − class_a − class_b − class_c QoS Partition
  • 29. Backfill Scheduling: • Priority factor: SLURM: Job Scheduling Fairshare: • It depends on: • Consumption. • Resources requested. QoS Partition Fairshare
  • 30. Backfill Scheduling: • Priority factor: SLURM: Job Scheduling Age: • Increase priority as more time the job pends on queue. • Max 7 days. • Not valid for dependent jobs! QoS Partition Fairshare Age
  • 31. Backfill Scheduling: • Priority factor: SLURM: Job Scheduling Job size: • Bigger jobs have more priority. • ONLY resources NOT time. QoS Partition Fairshare Age Job size
  • 33. •sbatch – Submit a batch script. •salloc – Request resources for an interactive job. •srun – Start a new task (job step). •scancel – Cancel a job. SLURM: Commands
  • 34. • sinfo – Report system status (nodes, queues, etc.). PARTITION AVAIL TIME NODES STATE NODELIST std* up inf+ 2 mix pirineus[15,21] std* up inf+ 30 alloc pirineus[13-14,16-20,22-44] std-fat up inf+ 3 idle~ pirineus[45,49-50] std-fat up inf+ 3 alloc pirineus[46-48] gpu up inf+ 2 idle~ pirineusgpu[3-4] gpu up inf+ 1 mix pirineusgpu2 knl up inf+ 3 idle~ pirineusknl[2-4] mem up inf+ 1 mix canigo1 class_a up inf+ 1 idle~ pirineus12 class_a up inf+ 2 mix canigo1,pirineus11 class_a up inf+ 8 alloc pirineus[1-6,8-9] class_a up inf+ 2 resv pirineus[7,10] class_c up inf+ 1 idle~ pirineus12 class_c up inf+ 2 mix canigo1,pirineus11 class_c up inf+ 8 alloc pirineus[1-6,8-9] class_c up inf+ 2 resv pirineus[7,10] SLURM: Commands
  • 35. • sinfo – Report system status. -N Node-oriented format information, with one line per node and partition. -p Print information only about the specified partition(s). --Format Specify the information to be displayed. "Nodelist,Partition,StateCompact,CpusState,Memory,Freemem" NODELIST PARTITION STATE CPUS(A/I/O/T) MEMORY FREE_MEM canigo1 class_a mix 112/80/0/192 4643070 2458001 pirineus1 class_a idle~ 0/48/0/48 191904 188950 pirineus2 class_a alloc 48/0/0/48 191904 44123 pirineus3 class_a alloc 48/0/0/48 191904 41831 pirineus4 class_a mix 32/16/0/48 191904 66623 pirineus5 class_a mix 16/32/0/48 191904 162277 pirineus6 class_a alloc 48/0/0/48 191904 82747 pirineus7 class_a idle~ 0/48/0/48 191904 189289 SLURM: Commands
  • 36. • sinfo – Report system status. -s List only a partition state summary with no node state details. PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST std* up infinite 32/0/0/32 pirineus[13-44] std-fat up infinite 3/3/0/6 pirineus[45-50] gpu up infinite 1/2/0/3 pirineusgpu[2-4] knl up infinite 0/3/0/3 pirineusknl[2-4] mem up infinite 1/0/0/1 canigo1 class_a up infinite 10/3/0/13 canigo1,pirineus[1-12] class_b up infinite 10/3/0/13 canigo1,pirineus[1-12] class_c up infinite 10/3/0/13 canigo1,pirineus[1-12] SLURM: Commands
  • 37. • sinfo – Report system status. -s List only a partition state summary with no node state details. TIP: Use system-status. SLURM: Commands +-----------+-------------+-----------------+--------------+------------+ | MACHINE | TOTAL SLOTS | ALLOCATED SLOTS | QUEUED SLOTS | OCCUPATION | +-----------+-------------+-----------------+--------------+------------+ | std nodes | 1536 | 1468 | 2212 | 95 % | | fat nodes | 288 | 144 | 0 | 50 % | | mem nodes | 96 | 96 | 289 | 100 % | | gpu nodes | 144 | 96 | 252 | 66 % | | knl nodes | 816 | 0 | 0 | 0 % | | res nodes | 672 | 648 | 1200 | 96 % | +-----------+-------------+-----------------+--------------+------------+
  • 38. • squeue – Report job and job step status. JOBID PARTIT NAME USER ST TIME NODES NODELIST 1222376 mem dada2 mvelasco PD 0:00 1 (Resources) 1221504 std Freq_TS_ uabqut16 PD 0:00 1 (Resources) 1222346 std Cu2T-tra agusti PD 0:00 1 (Priority) 1222347 std AuIPr_Ph sciortin PD 0:00 1 (Priority) 1220930 std nickeloc ubaqis07 PD 0:00 1 (Priority) 1222351 std g09d1 upceqt04 R 2:18:20 1 pirineus21 1221621 mem C3 vpenya R 23:56:04 1 canigo1 1221569 std preTS_VI porellan R 19:39:13 1 pirineus17 1221543 std Au2-Cl-d agusti R 1-13:40:32 1 pirineus22 1221616 std-fat CuII_mod mariona R 1-10:35:33 1 pirineus47 1221617 std-fat CuIII_mo mariona R 1-10:35:33 1 pirineus48 1221461 std opt-1xe2 pbesalu R 2-11:22:43 1 pirineus37 1221413 std s24ls_de jcirera R 4:08:01 1 pirineus22 1220720 std nickeloc ubaqis07 R 4-03:00:44 2 pirineus[34-35] 1220719 std nickeloc ubaqis07 R 4-03:00:48 1 pirineus14 1221546 mem C60-Zn-T pbesalu R 22:31:12 1 canigo1 SLURM: Commands
  • 39. • scontrol – Administrator tool to view and/or update system, job, step, partition or reservation status. scontrol hold <jobid> scontrol release <jobid> scontrol show job <jobid> SLURM: Commands
  • 40. SLURM: Commands JobId=1222543 JobName=test_large_g16.slm UserId=ifernandez(80347) GroupId=csuc(10000) MCS_label=N/A Priority=100209 Nice=0 Account=csuc QOS=test JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=05:04:05 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2020-01-16T09:55:19 EligibleTime=2020-01-16T09:55:19 AccrueTime=2020-01-16T09:55:19 StartTime=2020-01-16T09:55:20 EndTime=2020-01-17T09:55:21 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-01-16T09:55:20 Partition=std AllocNode:Sid=192.168.19.26:7243 ReqNodeList=(null) ExcNodeList=(null) NodeList=pirineus17 BatchHost=pirineus17 NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:* TRES=cpu=4,mem=15600M,node=1,billing=4 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=4 MinMemoryCPU=3900M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/ifernandez/examples/gaussian/g16/large/test_large_g16.slm WorkDir=/home/ifernandez/examples/gaussian/g16/large StdErr=/home/ifernandez/examples/gaussian/g16/large/slurm-1222543.out StdIn=/dev/null StdOut=/home/ifernandez/examples/gaussian/g16/large/slurm-1222543.out Power=
  • 42. SLURM: Job Life PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL HOLD RELEASE REQUEUE SUBMISSION
  • 43. SLURM: Job Life SUBMISSION PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED HOLD RELEASE REQUEUE TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL
  • 44. SLURM: Job Life PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED HOLD RELEASE REQUEUE SUBMISSION TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL
  • 45. SLURM: Job Life PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED HOLD RELEASE REQUEUE SUBMISSION TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL Pending Reasons: • Priority: One or more higher priority jobs exist for this partition or advanced reservation. • Reasources: The job is waiting for resources to become available. • Reservation: The job is waiting its advanced reservation to become available. • ReqNodeNotAvail: Some node specifically required by the job is not currently available. • JobHeldAdmin / JobHeldUser: The job is held by a system administrator / the user. • Dependency: This job is waiting for a dependent job to complete. • BadConstraints: The job's constraints can not be satisfied. • InvalidQOS: The job's QOS is invalid. Account’s assigned time exhausted? • AssociationTimeLimit: The job's association has reached its time limit. Account’s assigned time exhausted?
  • 46. SLURM: Job Life PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED HOLD RELEASE REQUEUE SUBMISSION TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL
  • 47. SLURM: Job Life PENDING (CONFIGURING) RUNNING HELD RESIZE CANCELED COMPLETING COMPLETED HOLD RELEASE REQUEUE SUBMISSION TIMEOUTFAIL OUT OF MEMORY SPECIAL EXIT NODE FAIL
  • 48. •SLURM Upgrade to 19.05 • New job state: OUT_OF_MEMORY. • Job killed by OOM. • Fixed ratio between MEMORY and CPU. SLURM: News Partition MAX. Mem per CPU (MB) MAX. Mem per CPU (GB) std 3900 MB 3,8 GB std-fat 7900 MB 7,7 GB mem 24180 MB 23,6 GB
  • 51. How to launch jobs?
  • 52. Login on CSUC infrastructure • Login ssh –p 2122 username@hpc.csuc.cat • Transferfiles scp -P 2122 local_file username@hpc.csuc.cat:[path to your folder] sftp -oPort=2122 username@hpc.csuc.cat • Useful paths Name Variable Availability Quote/project Time limit Backup /home/$user $HOME global >64 GB unlimited Yes /scratch/$user $SCRATCH global unlimited 30 days No /scratch/$user/tmp/jobid $TMPDIR / $SHAREDSCRATCH global job file limit 1 week No /tmp/$user/jobid $TMPDIR / $LOCALSCRATCH Local to each node job file limit 1 week No • Get HC consumption consum -a ‘any’ (group consumption) consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
  • 53. Batch job submission: Default settings • 4-8Gb/core (std and std-fat partitions). • 24Gb/core on mem partition. • 1 core on std, std-fat and mem partitions. • 24 cores and 1 GPU on gpu partition. • The whole node on KNL partition. • Non-exclusive, multinode job. • Working and Output directory are the submit directory.
  • 54. Batch job submission • Basic Linux commands: Description Command Exemple List files ls ls /home/user Making folders mkdir mkdir /home/prova Changing folder cd cd /home/prova Copy files cp cp nom_arxiu1 nom_arxiu2 Move file mv mv /home/prova.txt /cescascratch/prova.txt Delete file rm rm filename Print file content cat cat filename Find string into files grep grep ‘word’ filename List last lines on file tail tail filename • Text editors : vim, nano, emacs,etc. • More detailed info and options about the commands: ‘command’ –help man ‘command’
  • 55. #!/bin/bash #SBATCH–jJOB_NAME #SBATCH-o OUTPUT_FILE.log #SBATCH-e ERROR_FILE.err #SBATCH-p PARTITION #SBATCH–mem=TOTMEM #SBATCH-n NTASKS #SBATCH–c NCORES/TASK module load mpi/intel/openmpi/3.1.0 cp –r $input $SCRATCH Cd $SCRATCH srun $APPLICATION mkdir -p $OUTPUT_DIR cp -r * $output Batch job submission: The slurm submit script Schedulerdirectives Setting up the environment variables and paths Move the input files to the working directory Launch the application(similar to mpirun) Create the output folderand move the outputs
  • 56. Scheduler directives/Options : #SBATCH • -c, --cpus-per-task=ncpus number of cpus required per task • --gres=list required generic resources • -J, --job-name=jobname name of job • -n, --ntasks=ntasks number of tasks to run • --ntasks-per-node=n number of tasks to invoke on each node • -N, --nodes=N number of nodes on which to run (N = min[-max]) • -o, --output=out file for batch script's standard output • -p, --partition=partition partition requested • -t, --time=minutes time limit (format: dd-hh:mm)
  • 57. • -C, --constraint=list specify a list of constraints(mem, vnc , ....) • --mem=MB minimum amount of total real memory • --reservation=name allocate resources from named reservation • -w, --nodelist=hosts... request a specific list of hosts • --mem-per-cpu=MB amount of real memory per allocated core • -t, --time=minutes Job max duration (Mandatory!!) More commands/infotyping 'sbatch -h' Scheduler directives/Options : #SBATCH
  • 58. How to generate slurm script files: 1º Identify app parallelism Threadparallelism Process parallelism #SBATCH –-ntasks=1 #SBATCH --cpus-per-task=NCORES #SBATCH –-ntasks=NCORES #SBATCH --cpus-per-task=1
  • 59. How to generate slurm script files: 2º Determine the memory requirements #SBATCH –-mem=63900 #SBATCH --cpus-per-task=8 #SBATCH --partition=std-fat The partition choice is strongly dependent of the job memory requirements !! #SBATCH –-mem=63900 #SBATCH --cpus-per-task=16 #SBATCH --partition=std #SBATCH –-mem=63900 #SBATCH --cpus-per-task=4 #SBATCH --partition=mem #SBATCH –-mem-per-cpu=3900 #SBATCH --ntasks=16 #SBATCH --partition=std Partition Memory/core std/gpu std-fat/KNL mem 4Gb 8Gb 24Gb
  • 60. How to generate slurm script files: 3º RunTime requirements #SBATCH --time=Thpc WORKSTATION --> 4 Cores(Nws) 8-16Gb RAM 1Tb 600mb/s Ethernet 1-10 Gbs HPC NODE 48 Cores(Nhpc) 192Gb RAM 200Tb 4Gb/s Infiniband 100-200Gbs Performance comparison At first approximation:
  • 61. How to generate slurm script files: 4º Disk/IO requirements Two kind of applications Threaded/serial Multitask Only one node: Multinode: cd $SHAREDSCRATCH Or cd $LOCALSCRATCH cd $SHAREDSCRATCH Or let SLURM decide for you cd $SCRATCH
  • 62. How to generate slurm script files: Summary 1. Identify your application parallelism. 2. Estimate the amount of resources needed by your solving algorithm. 3. Estimate as better as possible the runtime. 4. Determine if your job I/O and input requirements. 5. Determine which are the necessary output files and save only these files in your own disk space.
  • 63. Gaussian 16 (Threaded Example) #!/bin/bash #SBATCH-j gau16_test #SBATCH-o gau_test_%j.log #SBATCH-e gau_test_%j.err #SBATCH-n 1 #SBATCH-c 16 #SBATCH-p std #SBATCH–mem=30000 #SBATCH–time=10-00 module load gaussian/g16b1 INPUT_DIR=/$HOME/gaussian_test/inputs OUTPUT_DIR=$HOME/gaussian_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. g16 < input.gau > output.out mkdir -p $OUTPUT_DIR cp -r output.out $output Threaded application Less than 4Gb/core, std partition 10 Days RunTime Set up environment to run the APP
  • 64. Vasp 5.4.4 (Multitask Example) #!/bin/bash #SBATCH-j vasp_test_%j #SBATCH-o vasp_test_%j.log #SBATCH–e vasp_test_%j.err #SBATCH-n 24 #SBATCH–c 1 #SBATCH–mem-per-cpu=7500 #SBATCH-p std-fat #SBATCH–time=20:00 module load vasp/5.4.4 INPUT_DIR=/$HOME/vasp_test/inputs OUTPUT_DIR=$HOME/vasp_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. srun `which vasp_std` mkdir -p $OUTPUT_DIR cp -r * $output Multitaskapplication More than 4Gb/core,but less than 8Gb/core , std-fat partition 20 Min RunTime Set up environment to run the APP Multitask app requires 'srun' command
  • 65. Gromacs (MultiTask and threaded Example) #!/bin/bash #SBATCH--job-name=gromacs #SBATCH--output=gromacs_%j.out #SBATCH--error=gromacs_%j.err #SBATCH-n 24 #SBATCH-c 2 #SBATCH-N 1 #SBATCH-p gpu #SBATCH--gres=gpu:2 #SBATCH--time=00:30:00 module load gromacs/2018.4_mpi cd $SHAREDSCRATCH cp -r $HOME/SLMs/gromacs/CASE/*. srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb gpu -npme 12 -dlb yes -pin on –gpu_id 01 cp –r * /scratch/$USER/gromacs/CASE/output/ 1 NODE Hybrid job! 2GPUs/Node on GPU partition
  • 66. ANSYS Fluent (MultiTask Example) #!/bin/bash #SBATCH-j truck.cas #SBATCH-o truck.log #SBATCH-e truck.err #SBATCH-p std #SBATCH-n 16 #SBATCH–time=10-20:00 module load toolchains/gcc_mkl_ompi INPUT_DIR=$HOME/FLUENT/inputs OUTPUT_DIR=$HOME/FLUENT/outputs cd $SCRATCH cp -r $INPUT_DIR/*. /prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt mkdir -p $OUTPUT_DIR cp -r * $output
  • 67. Best Practices • Use $SCRATCHas workingdirectory. • Move only the necessaryfiles(notall files in the folder each time). • Try to keep importantfiles only at $HOME • Try to choose the partition and resoruces whose mostfit to your job.
  • 68. Thank you for your attention!