This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu

Partition (queue), node and licenses status

Show queued jobs, show more details (‘long’ view that includes the job time limit):

squeue
squeue -l

Show only the queued jobs of your user ($USER is an environment variable in your shell), then for another specific user:

squeue -u $USER
squeue -u vplugaru

Show queued jobs in a specific partition:

squeue -p $partition

Show queued jobs that are in a specific state (pending / running / failed / preempted, see man squeue for all available states):

squeue -t PD
squeue -t R
squeue -t F
squeue -t PR

Show partition status, summarized status (without node state), and node-oriented partition status:

sinfo
sinfo -s
sinfo -N

Show node details including available features (to be used with the -C option of sbatch/srun):

sinfo -l -N

Show node reservations that have been created by the administrators for specific users or accounts:

sinfo -T

Show node details (all nodes, specific node):

scontrol show nodes
scontrol show nodes $nodename

Check the default account your jobs will use:

sacctmgr show user $USER format=user%20s,defaultaccount%30s`

See all account associations for your user and the QOS they grant access to:

sacctmgr list association where users=$USER format=account%30s,user%20s,qos%120s

See configured licenses and their status (#tokens used and free):

scontrol show licenses

Job submission and management

Starting interactive jobs

Start an interactive job with the default number of cores and walltime:

srun -p interactive --pty bash -i

Start an interactive job for 30 minutes, with 2 nodes and 4 tasks per node:

srun -p interactive --time=0:30:0 -N 2 --ntasks-per-node=4 --pty bash -i

Start an interactive job with X11 forwarding such that GUI applications (running in the cluster) will be shown on your workstation:
- note that your initial connection to the iris cluster needs to have X11 Forwarding enabled, e.g. ssh -X iris-cluster

srun -p interactive --pty --x11 bash -i

Start a best-effort interactive job (can be interrupted by regular jobs if other users submit them):

srun -p interactive --qos besteffort --pty bash -i

Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses:

srun -p interactive -L forge:8 --pty bash -i

Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses and 16 Allinea Performance reports licenses:

srun -p interactive -L forge:8,perfreport:16 --pty bash -i

Note:

Make interactive jobs easier to launch, add to your ~/.bashrc:
- alias si='srun -p interactive --pty bash -i' or e.g.
- alias si='srun -p interactive --time=0:30:0 --pty bash -i'

Submitting passive jobs

We maintain a page dedicated to examples of SLURM batch (launcher) scripts that you can use for your batch jobs.

Submit to the queue a job script (job launcher) in which you’ve added SLURM directives (#SBATCH $directive) with the job specification (name, number of requested nodes, memory, walltime, etc.):

sbatch job.sh

Submit a job script, overriding on the command line the number of requested nodes:

sbatch -N 2 job.sh

Submit a job script to the batch partition:

sbatch -p batch job.sh

Submit a job script to the long partition that permits a long walltime:

sbatch -p long job.sh

Submit a job script to the batch partition, requesting only nodes with Broadwell CPUs:

sbatch -p batch -C broadwell

Submit a job script to the batch partition, requesting only nodes with Skylake (AVX-512 ISA) CPUs:

sbatch -p batch -C skylake

Submit a job script to the gpu partition, requesting 2 cores and 2 GPUs on a single node:

sbatch -N 1 -n 2 --gpus=2 -p gpu job.sh

Submit a job script to the gpu partition, requesting 2 cores and 2 GPUs on a single node, each GPU with 32GB on-board memory:

sbatch -N 1 -n 2 --gpus=2 -C volta32 -p gpu job.sh

Submit a job script to the gpu partition, requesting 4 nodes with 2 cores/node and 4 GPUs/node:

sbatch -N 4 --ntasks-per-node=2 --gpus-per-node=4 -p gpu job.sh

Submit a job script to the bigmem partition, requesting 64 tasks (with 1 core/task) and 2TB of RAM on a single node:

sbatch -N 1 -n 64 --mem=2T -p bigmem job.sh

Submit a job script to the bigmem partition, requesting the full node (112 cores and all associated RAM, ~3TB):

sbatch -N 1 -n 112 -p bigmem job.sh

Submit a job script and request a specific start time:
1. current day at a precise hour
2. relative to a moment in time: now, today, tomorrow are recognized keywords, to be used together with seconds (default), minutes, hours, days, weeks time units
3. relative to a moment in time combining time specifications
4. specific date and hour

sbatch --begin=16:00 job.sh
sbatch --begin=tomorrow job.sh
sbatch --begin=now+2hours job.sh
sbatch --begin=2017-06-23T07:30:00 job.sh

Submit a best-effort job to the batch partition (can be interrupted by regular jobs if other users submit them):

sbatch -p batch --qos besteffort job.sh

Collecting job information

Show the details of a job:

scontrol show job $jobid

Check waiting job priority (detailed view):

sprio -l

Check expected job start time:

squeue --start -u $USER

Show running job (and steps) system-level utilization (memory, I/O, energy):
- note that sstat information is limited to your own jobs

sstat -j $jobid

Show specific statistics from a running job (and steps) or multiple jobs:
- use sstat -e to see a list of available output fields

sstat -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
sstat -j $jobid1,$jobid2 --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize

Output the statistics in a parseable format, delimited by | (with, then without trailing |):

sstat -p -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
sstat -P -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize

Show running or completed job (and steps) system-level utilization from the accounting information, and with full details:

sacct -j $jobid
sacct -j $jobid -l

Show statistics relevant to the job allocation itself not taking steps into consideration, and with more details:

sacct -X -j $jobid
sacct -X -j $jobid -l

Show a subset of interesting statistics from a completed job and its steps, including:
1. elapsed time in both human readable and total # of seconds
2. maximum resident set size of all tasks in job (you may want to add also maxrssnode and maxrsstask for a better understanding of which process consumed memory)
3. maximum virtual memory size (idem for maxvmsizenode and maxvmsizetask)
4. consumed energy (in Joules), be aware there are many caveats!
  - your job needs to be the only one running on the corresponding compute nodes
  - the RAPL mechanism will not take into account all possible hardware elements which consume power (CPUs, GPUs and DRAM are included)

sacct -j $jobid --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist

Output the same statistics in the parseable |-delimited format, for a single and multiple jobs:
- use sacct -e to see a list of available output fields

sacct -p -j $jobid --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist
sacct -p -j $jobid1,$jobid2 --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist

Show statistics for all personal jobs started since a particular date, then without job steps:

sacct --starttime 2017-05-01 -u vplugaru
sacct -X --starttime 2017-05-01 -u vplugaru

Pausing, resuming and cancelling jobs

To stop a waiting job from being scheduled and later to allow it to be scheduled:

scontrol hold $jobid
scontrol release $jobid

To pause a running job and then resume it:

scontrol suspend $jobid
scontrol resume $jobid

To remove a job from the queue (stopping it if already started):

scancel $jobid

To remove a job by name:

scancel --name=$jobname
scancel -n $jobname

To remove all user jobs:

scancel --user=$USER
scancel -u $USER

To remove all waiting jobs (pending state) for a given user:

scancel --user=$USER --state=pending
scancel -u $USER -t pending

To remove all waiting jobs in a given partition (e.g. batch):

scancel -u $USER --partition=batch
scancel -u $USER -p batch

To stop and restart a given job:

scontrol requeue $jobid

HPC @ Uni.lu

SLURM Examples