Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu

Partition (queue), node and licenses status

  • Show queued jobs, show more details (‘long’ view that includes the job time limit):
1
2
squeue
squeue -l
  • Show only the queued jobs of your user ($USER is an environment variable in your shell), then for another specific user:
1
2
squeue -u $USER
squeue -u vplugaru
  • Show queued jobs in a specific partition:
1
squeue -p $partition
  • Show queued jobs that are in a specific state (pending / running / failed / preempted, see man squeue for all available states):
1
2
3
4
squeue -t PD
squeue -t R
squeue -t F
squeue -t PR
  • Show partition status, summarized status (without node state), and node-oriented partition status:
1
2
3
sinfo
sinfo -s
sinfo -N
  • Show node details including available features (to be used with the -C option of sbatch/srun):
1
sinfo -l -N
  • Show node reservations that have been created by the administrators for specific users or accounts:
1
sinfo -T
  • Show node details (all nodes, specific node):
1
2
scontrol show nodes
scontrol show nodes $nodename
  • Check the default account your jobs will use:
1
sacctmgr show user $USER format=user%20s,defaultaccount%30s`
  • See all account associations for your user and the QOS they grant access to:
1
sacctmgr list association where users=$USER format=account%30s,user%20s,qos%120s
  • See configured licenses and their status (#tokens used and free):
1
scontrol show licenses

Job submission and management

Starting interactive jobs

  • Start an interactive job with the default number of cores and walltime:
1
srun -p interactive --pty bash -i
  • Start an interactive job for 30 minutes, with 2 nodes and 4 tasks per node:
1
srun -p interactive --time=0:30:0 -N 2 --ntasks-per-node=4 --pty bash -i
  • Start an interactive job with X11 forwarding such that GUI applications (running in the cluster) will be shown on your workstation:
    • note that your initial connection to the iris cluster needs to have X11 Forwarding enabled, e.g. ssh -X iris-cluster
1
srun -p interactive --pty --x11 bash -i
  • Start a best-effort interactive job (can be interrupted by regular jobs if other users submit them):
1
srun -p interactive --qos besteffort --pty bash -i
  • Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses:
1
srun -p interactive -L forge:8 --pty bash -i
  • Start an interactive jobs asking for 8 Allinea Forge (DDT/MAP) licenses and 16 Allinea Performance reports licenses:
1
srun -p interactive -L forge:8,perfreport:16 --pty bash -i

Note:

  • Make interactive jobs easier to launch, add to your ~/.bashrc:
    • alias si='srun -p interactive --pty bash -i' or e.g.
    • alias si='srun -p interactive --time=0:30:0 --pty bash -i'

Submitting passive jobs

We maintain a page dedicated to examples of SLURM batch (launcher) scripts that you can use for your batch jobs.

  • Submit to the queue a job script (job launcher) in which you’ve added SLURM directives (#SBATCH $directive) with the job specification (name, number of requested nodes, memory, walltime, etc.):
1
sbatch job.sh
  • Submit a job script, overriding on the command line the number of requested nodes:
1
sbatch -N 2 job.sh
  • Submit a job script to the batch partition:
1
sbatch -p batch job.sh
  • Submit a job script to the long partition that permits a long walltime:
1
sbatch -p long job.sh
  • Submit a job script to the batch partition, requesting only nodes with Broadwell CPUs:
1
sbatch -p batch -C broadwell
  • Submit a job script to the batch partition, requesting only nodes with Skylake (AVX-512 ISA) CPUs:
1
sbatch -p batch -C skylake
  • Submit a job script to the gpu partition, requesting 2 cores and 2 GPUs on a single node:
1
sbatch -N 1 -n 2 --gpus=2 -p gpu job.sh
  • Submit a job script to the gpu partition, requesting 2 cores and 2 GPUs on a single node, each GPU with 32GB on-board memory:
1
sbatch -N 1 -n 2 --gpus=2 -C volta32 -p gpu job.sh
  • Submit a job script to the gpu partition, requesting 4 nodes with 2 cores/node and 4 GPUs/node:
1
sbatch -N 4 --ntasks-per-node=2 --gpus-per-node=4 -p gpu job.sh
  • Submit a job script to the bigmem partition, requesting 64 tasks (with 1 core/task) and 2TB of RAM on a single node:
1
sbatch -N 1 -n 64 --mem=2T -p bigmem job.sh
  • Submit a job script to the bigmem partition, requesting the full node (112 cores and all associated RAM, ~3TB):
1
sbatch -N 1 -n 112 -p bigmem job.sh
  • Submit a job script and request a specific start time:
    1. current day at a precise hour
    2. relative to a moment in time: now, today, tomorrow are recognized keywords, to be used together with seconds (default), minutes, hours, days, weeks time units
    3. relative to a moment in time combining time specifications
    4. specific date and hour
1
2
3
4
sbatch --begin=16:00 job.sh
sbatch --begin=tomorrow job.sh
sbatch --begin=now+2hours job.sh
sbatch --begin=2017-06-23T07:30:00 job.sh
  • Submit a best-effort job to the batch partition (can be interrupted by regular jobs if other users submit them):
1
sbatch -p batch --qos besteffort job.sh

Collecting job information

  • Show the details of a job:
1
scontrol show job $jobid
  • Check waiting job priority (detailed view):
1
sprio -l
  • Check expected job start time:
1
squeue --start -u $USER
  • Show running job (and steps) system-level utilization (memory, I/O, energy):
    • note that sstat information is limited to your own jobs
1
sstat -j $jobid
  • Show specific statistics from a running job (and steps) or multiple jobs:
    • use sstat -e to see a list of available output fields
1
2
sstat -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
sstat -j $jobid1,$jobid2 --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
  • Output the statistics in a parseable format, delimited by | (with, then without trailing |):
1
2
sstat -p -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
sstat -P -j $jobid --format=AveCPU,AveRSS,AveVMSize,MaxRSS,MaxVMSize
  • Show running or completed job (and steps) system-level utilization from the accounting information, and with full details:
1
2
sacct -j $jobid
sacct -j $jobid -l
  • Show statistics relevant to the job allocation itself not taking steps into consideration, and with more details:
1
2
sacct -X -j $jobid
sacct -X -j $jobid -l
  • Show a subset of interesting statistics from a completed job and its steps, including:
    1. elapsed time in both human readable and total # of seconds
    2. maximum resident set size of all tasks in job (you may want to add also maxrssnode and maxrsstask for a better understanding of which process consumed memory)
    3. maximum virtual memory size (idem for maxvmsizenode and maxvmsizetask)
    4. consumed energy (in Joules), be aware there are many caveats!
      • your job needs to be the only one running on the corresponding compute nodes
      • the RAPL mechanism will not take into account all possible hardware elements which consume power (CPUs, GPUs and DRAM are included)
1
sacct -j $jobid --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist
  • Output the same statistics in the parseable |-delimited format, for a single and multiple jobs:
    • use sacct -e to see a list of available output fields
1
2
sacct -p -j $jobid --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist
sacct -p -j $jobid1,$jobid2 --format=account,user,jobid,jobname,partition,state,elapsed,elapsedraw,start,end,maxrss,maxvmsize,consumedenergy,consumedenergyraw,nnodes,ncpus,nodelist
  • Show statistics for all personal jobs started since a particular date, then without job steps:
1
2
sacct --starttime 2017-05-01 -u vplugaru
sacct -X --starttime 2017-05-01 -u vplugaru

Pausing, resuming and cancelling jobs

  • To stop a waiting job from being scheduled and later to allow it to be scheduled:
1
2
scontrol hold $jobid
scontrol release $jobid
  • To pause a running job and then resume it:
1
2
scontrol suspend $jobid
scontrol resume $jobid
  • To remove a job from the queue (stopping it if already started):
1
scancel $jobid
  • To remove a job by name:
1
2
scancel --name=$jobname
scancel -n $jobname
  • To remove all user jobs:
1
2
scancel --user=$USER
scancel -u $USER
  • To remove all waiting jobs (pending state) for a given user:
1
2
scancel --user=$USER --state=pending
scancel -u $USER -t pending
  • To remove all waiting jobs in a given partition (e.g. batch):
1
2
scancel -u $USER --partition=batch
scancel -u $USER -p batch
  • To stop and restart a given job:
1
scontrol requeue $jobid