This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu
SLURM Examples
Partition (queue), node and licenses status
- Show queued jobs, show more details (‘long’ view that includes the job time limit):
1 2 |
|
- Show only the queued jobs of your user (
$USER
is an environment variable in your shell), then for another specific user:
1 2 |
|
- Show queued jobs in a specific partition:
1
|
|
- Show queued jobs that are in a specific state (pending / running / failed / preempted, see
man squeue
for all available states):
1 2 3 4 |
|
- Show partition status, summarized status (without node state), and node-oriented partition status:
1 2 3 |
|
- Show node details including available features (to be used with the
-C
option of sbatch/srun):
1
|
|
- Show node reservations that have been created by the administrators for specific users or accounts:
1
|
|
- Show node details (all nodes, specific node):
1 2 |
|
- Check the default account your jobs will use:
1
|
|
- See all account associations for your user and the QOS they grant access to:
1
|
|
- See configured licenses and their status (#tokens used and free):
1
|
|
Job submission and management
Starting interactive jobs
- Start an interactive job with the default number of cores and walltime:
1
|
|
- Start an interactive job for 30 minutes, with 2 nodes and 4 tasks per node:
1
|
|
- Start an interactive job with X11 forwarding such that GUI applications (running in the cluster) will be shown on your workstation:
- note that your initial connection to the iris cluster needs to have X11 Forwarding enabled, e.g.
ssh -X iris-cluster
- note that your initial connection to the iris cluster needs to have X11 Forwarding enabled, e.g.
1
|
|
- Start a best-effort interactive job (can be interrupted by regular jobs if other users submit them):
1
|
|
- Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses:
1
|
|
- Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses and 16 Allinea Performance reports licenses:
1
|
|
Note:
- Make interactive jobs easier to launch, add to your
~/.bashrc
:alias si='srun -p interactive --pty bash -i'
or e.g.alias si='srun -p interactive --time=0:30:0 --pty bash -i'
Submitting passive jobs
We maintain a page dedicated to examples of SLURM batch (launcher) scripts that you can use for your batch jobs.
- Submit to the queue a job script (job launcher) in which you’ve added SLURM directives (
#SBATCH $directive
) with the job specification (name, number of requested nodes, memory, walltime, etc.):
1
|
|
- Submit a job script, overriding on the command line the number of requested nodes:
1
|
|
- Submit a job script to the
batch
partition:
1
|
|
- Submit a job script to the
long
partition that permits a long walltime:
1
|
|
- Submit a job script to the
batch
partition, requesting only nodes with Broadwell CPUs:
1
|
|
- Submit a job script to the
batch
partition, requesting only nodes with Skylake (AVX-512 ISA) CPUs:
1
|
|
- Submit a job script to the
gpu
partition, requesting 2 cores and 2 GPUs on a single node:
1
|
|
- Submit a job script to the
gpu
partition, requesting 2 cores and 2 GPUs on a single node, each GPU with 32GB on-board memory:
1
|
|
- Submit a job script to the
gpu
partition, requesting 4 nodes with 2 cores/node and 4 GPUs/node:
1
|
|
- Submit a job script to the
bigmem
partition, requesting 64 tasks (with 1 core/task) and 2TB of RAM on a single node:
1
|
|
- Submit a job script to the
bigmem
partition, requesting the full node (112 cores and all associated RAM, ~3TB):
1
|
|
- Submit a job script and request a specific start time:
- current day at a precise hour
- relative to a moment in time: now, today, tomorrow are recognized keywords, to be used together with seconds (default), minutes, hours, days, weeks time units
- relative to a moment in time combining time specifications
- specific date and hour
1 2 3 4 |
|
- Submit a best-effort job to the
batch
partition (can be interrupted by regular jobs if other users submit them):
1
|
|
Collecting job information
- Show the details of a job:
1
|
|
- Check waiting job priority (detailed view):
1
|
|
- Check expected job start time:
1
|
|
- Show running job (and steps) system-level utilization (memory, I/O, energy):
- note that
sstat
information is limited to your own jobs
- note that
1
|
|
- Show specific statistics from a running job (and steps) or multiple jobs:
- use
sstat -e
to see a list of available output fields
- use
1 2 |
|
- Output the statistics in a parseable format, delimited by
|
(with, then without trailing|
):
1 2 |
|
- Show running or completed job (and steps) system-level utilization from the accounting information, and with full details:
1 2 |
|
- Show statistics relevant to the job allocation itself not taking steps into consideration, and with more details:
1 2 |
|
- Show a subset of interesting statistics from a completed job and its steps, including:
- elapsed time in both human readable and total # of seconds
- maximum resident set size of all tasks in job (you may want to add also
maxrssnode
andmaxrsstask
for a better understanding of which process consumed memory) - maximum virtual memory size (idem for
maxvmsizenode
andmaxvmsizetask
) - consumed energy (in Joules), be aware there are many caveats!
- your job needs to be the only one running on the corresponding compute nodes
- the RAPL mechanism will not take into account all possible hardware elements which consume power (CPUs, GPUs and DRAM are included)
1
|
|
- Output the same statistics in the parseable
|
-delimited format, for a single and multiple jobs:- use
sacct -e
to see a list of available output fields
- use
1 2 |
|
- Show statistics for all personal jobs started since a particular date, then without job steps:
1 2 |
|
Pausing, resuming and cancelling jobs
- To stop a waiting job from being scheduled and later to allow it to be scheduled:
1 2 |
|
- To pause a running job and then resume it:
1 2 |
|
- To remove a job from the queue (stopping it if already started):
1
|
|
- To remove a job by name:
1 2 |
|
- To remove all user jobs:
1 2 |
|
- To remove all waiting jobs (
pending
state) for a given user:
1 2 |
|
- To remove all waiting jobs in a given partition (e.g. batch):
1 2 |
|
- To stop and restart a given job:
1
|
|