Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

OAR is an open-source batch scheduler which provides simple yet flexible facilities for the exploitation of a cluster. It manages resources of clusters as a traditional batch scheduler (as PBS / Torque / LSF / SGE). It is used in particular on the Grid’5000 platform. The official documentation is available here:

We provide a set of launcher scripts that may help your workflow.

Quick start

This small section is here to get you started as quickly as possible on the UL HPC platform.
More indepth details you will find in the next sections and you are encouraged to read them.

Using a computing node (server) interactively

  1. Get one computing core for one hour: oarsub -I -l nodes=1/core=1,walltime=1:0:0
  2. Get one core for 5 minutes: oarsub -I -l nodes=1/core=1,walltime=0:5:0
  3. Get two cores on the same node for one hour: oarsub -I -l nodes=1/core=2,walltime=1:0:0
  4. Get two cores on different nodes for one hour: oarsub -I -l nodes=2/core=1,walltime=1:0:0
  5. Get four cores (total) on two different nodes for one hour: oarsub -I -l nodes=2/core=2,walltime=1:0:0
  6. Get 10 cores, possibly on different nodes (!), for one hour: oarsub -I -l core=10,walltime=1
  7. Get 100 cores for one hour: oarsub -I -l core=100,walltime=1
  8. Get all cores and memory (implicitly) of a node for one hour: oarsub -I -l nodes=1,walltime=1
  9. Get all cores and memory (implicitly) of four nodes for 25 minutes: oarsub -I -l nodes=4,walltime=0:25
  10. Get as many cores as possible on a node which has Xeon Haswell processors for one hour:
    oarsub -I -l nodes=1/core=BEST,walltime=1 -p "cputype='xeon-haswell'"
  11. Get as many cores as possible on a node which has NVIDIA K80 GPUs for one hour:
    oarsub -I -l nodes=1/core=BEST,walltime=1 -p "gputype='K80'"
  12. Get a large memory machine (at least 1TB RAM) for 10 minutes:
    oarsub -l nodes=1,walltime=0:10 -t bigmem
  13. Get a large memory machine with Xeon Haswell processors for 10 minutes:
    oarsub -l nodes=1,walltime=0:10 -t bigmem -p "cputype='xeon-haswell'"

Notes:

  • interactively means that as soon as your OAR job starts, your terminal will be connected to the first computing node associated with your OAR job.
  • you close the interactive session with the command exit
  • requesting e.g. 100 computing cores will only reserve them for you, your application still needs to be able to use them by implementing a parallelism model (!).
  • you can connect (ssh) between computing nodes in your reservation with: oarsh nodename
  • job details are available from within the job itself in the OAR_JOB_ID, OAR_NODEFILE

Using a computing node in batch (unattended) mode

  1. Run mycommand from the current directory on one computing core for one hour:
    oarsub -l nodes=1/core=1,walltime=1 ./mycommand
  2. Run mycommand from a given directory on 128 computing cores for 10 minutes:
    oarsub -l core=128,walltime=0:10 /path/to/mycommand
  3. Run mycommand parameter1 parameter2 that is in your PATH (environment) on a computing node for one hour, giving it a name such that it can be easily identified later:
    oarsub -n jobname -l nodes=1,walltime=1 "mycommand parameter1 parameter2"

Notes:

  • your terminal will not be connected to the job when it starts (after job submission you are still on the cluster access node)
  • you can connect to your running batch job with its known OAR job id with: oarsub -C jobid

Minimal batch script examples:

  1. Start a job through a script from the current directory which contains all your requirements and processing commands: oarsub -S ./myscript. Example myscript:
1
2
3
4
5
6
#!/bin/bash -l
#OAR -n myjobname
#OAR -l nodes=1/core=1,walltime=1

mycommand1
mycommand2 parameter1 parameter2
  1. Example script which loads a specific software module and runs that software in a given directory, saving its standard output and standard error streams to different files:
1
2
3
4
5
6
7
#!/bin/bash -l
#OAR -n myjobname
#OAR -l nodes=1/core=1,walltime=1

module load category/software
cd /path/to/directory
softwarename parameter1 parameter2  >out.log 2>out.err
  1. Example script which runs MPI parallel software using Intel MPI (toolchain/ictce module containing also compilers and libraries) on 128 cores on Xeon Haswell CPUs:
1
2
3
4
5
6
7
8
#!/bin/bash -l
#OAR -n myjobname
#OAR -l core=128,walltime=0:30:0
#OAR -p cputype='xeon-haswell'

module load toolchain/ictce
cd /path/to/directory
mpirun -hostfile $OAR_NODEFILE /path/to/your/software parameter1 parameter2
  1. Example script which runs MPI parallel software using Open MPI (mpi/OpenMPI module) on 128 cores on Xeon Haswell CPUs:
1
2
3
4
5
6
7
8
#!/bin/bash -l
#OAR -n myjobname
#OAR -l core=128,walltime=0:30:0
#OAR -p cputype='xeon-haswell'

module load mpi/OpenMPI
cd /path/to/directory
mpirun -hostfile $OAR_NODEFILE -x PATH -x LD_LIBRARY_PATH /path/to/your/software parameter1 parameter2

Notes:

  • Your script needs to be executable (chmod +x myscript) to be run in this way.
  • The same options you use on the command line for oarsub you can provide inside the script (launched with oarsub -S ./myscript), prefixed by the #OAR pragma.
  • There are many more useful options available in oarsub, check man oarsub to see the full listing.

Checking the status of your job and stopping a job

  1. Check the status of all jobs on the cluster: oarstat
  2. Check the status of your jobs on the cluster: oarstat -u
  3. Get the status of a specific job by its OAR job id: oarstat -j jobid
  4. Get just the status of a specific job: oarstat -s -j jobid
  5. Get the full details of a specific job: oarstat -j jobid -f
  6. Get the full details of all your jobs: oarstat -u -f
  7. Stopping or removing a job before it started: oardel jobid
  8. Stopping or removing two jobs: oardel jobid1 jobid2

Notes:

  • The status is given in the S column of the oarstat output: W - Waiting, R - Running
  • A ‘Terminated’ state in oarstat -s -j jobid output indicates that the last command in the job exited with a 0 return code.
  • The ‘Error’ status in oarstat -s -j jobid output indicates that the return code was different than 0 (may not indicate a problem with the job itself).

Concepts

Reservation is handled on the front-end server by the command oarsub. For those who may not be familiar with batch scheduler vocabulary, the following definitions are now provided to better understand the different OAR mechanisms:

  • Submission: The system decides when your job begins, in order to optimize the global scheduling. If there is no available node, you may have to wait! (corresponds to oarsub -I or oarsub scriptName syntaxes)
  • Reservation: You decide when your job should begin, provided the node(s) will be available at that date. If you did not specify which node(s), the system will choose them for you. If the requested resources are not available at the date specified when the reservation is made, the reservation fails, and you must either change your resource request or change the job start date. At the start date, the reservation may only provide part of the resources you requested if some became unavailable (because they broke down meanwhile). (corresponds to oarsub -r or oarsub -r scriptName syntaxes)
  • Interactive: You just request some nodes, either by submission or reservation, and you then log in manually and work interactively. (corresponds to oarsub -I for submission or oarsub -r ; oarsub -C jobId for reservation)
  • Passive: You point to a script that should be automatically batched by the system; you don’t need to log to the nodes at all. (corresponds to oarsub scriptName for submission or oarsub -r scriptName for reservation)
  • Types of job: There are basically two operating modes:

    • default: you just use the nodes default environment, whatever the scheduling (reservation or submission, interactive or passive);
    • best effort: this is a special operating queue with less priority, as explained below.
Job Type Submission Reservation
interactive oarsub -I oarsub -r ; oarsub -C jobId
passive oarsub scriptName oarsub -r scriptName

OAR provides the following features:

  • A better resource management: Using the Linux kernel feature called cpusets, OAR 2 allows a more reliable management of the resources. In particular,

    • No unattended processes should remain after a job completes - ever.
    • Access to the resources is now restricted to the de facto owner of the resources. Features like job dependency and check-pointing are now available, allowing better use of resources. A cpuset is attached to every process, and allows:
    • to specify which resource processor/memory can be used by a process, e.g. resources allocated to the job in the OAR context.
    • to group and identify processes that share the same cpuset, e.g. the processes of a job in OAR context, so that actions like clean-up can be efficiently performed. Here, cpusets provide a replacement for the group/session of processes concept that is not efficient in Linux.
  • Resources hierarchies: OAR can manage complex hierarchies of resources. Here, we use the following hierarchy: (1) nodes, (2) cpucore (3) core. You’ll probably be interested only in requesting a given number of nodes or cores
  • A modern cluster management system: By providing a mechanism to isolate the jobs at the core level, OAR is one of the most modern cluster management systems. Users developing cluster or grid algorithms and programs will then work in a today’s up-to-date environment similar to the ones they will meet with other recent cluster management systems on production platforms for instance.
  • Optimization of the resources usage: Nowadays, machines with more than 4 cores are common. Thus, it is then very important to be able to handle cores efficiently. By providing resources selection and processes isolation at the core level, OAR allows users running experiments that do not require the exclusivity of a node (at least during a preparation phase) to have access to many nodes on one core only, but leave the remaining cores free for other users. This can allow to optimize the number of available resources. Besides, OAR also provide a time-sharing feature which will allow to share a same set of resources among users. This will especially be useful during demonstration or events such as plugtest.
  • Easier access to the resources: Using OAR oarsh connector to access the job resources, basic usage will not anymore require the user to configure his SSH environment as everything is handled internally (known host keys management, etc). Besides, users that would actually prefer not using oarsh can still use ssh with just the cost of some options to set (one of the features of the oarsh wrapper is to actually hide these options).

Job notion in OAR

In OAR, a job is defined by a number of required resources and eventually a script/program to run. So, the user must specify how many resources and what kind of them are needed by his application. Thus, OAR system will give him or not what he wants and will control the execution. When a job is launched, OAR executes user program only on the first reservation node. The following environment variables are defined once a job is created to characterize the reservation operated:

$OAR_NODEFILE contains the name of a file which lists all reserved nodes for this job
$OAR_JOB_ID contains the OAR job identificator
$OAR_RESOURCE_PROPERTIES_FILE contains the name of a file which lists all resources and their properties
$OAR_JOB_NAME name of the job given by the “-n” option of oarsub
$OAR_PROJECT_NAME job project name

Submitting a job is conducted using the oarsub command. Mainly, you’ll use this command in two ways:

  • oarsub [options] -I : for an interactive job (see previous glossary)
  • oarsub [options] scriptName : for a passive job to execute the script scriptName (note that this script is only executed on the first reserved node).

The most useful options are the following (see oarsub(1) for mor details):

  • -I, --interactive: Request an interactive job. Open a login shell on the first node of the reservation instead of running a script.
  • -l, --resource=<list>: Set the requested resources for the job. You may here specify the number of nodes, cpus and cores (separated by a slash ‘/’) and the walltime of the job i.e its duration. Walltime format is hour (hour:mn:sec|hour:mn).
  • Ex: -l nodes=2/cpu=1/core=2,walltime=2:00:00 reserves 2 cores on 1 cpu of 2 nodes.
  • -r, --reservation=<date>: Request a job start time reservation, instead of a direct submission.
  • -n, --name=<txt>: Specify an arbitrary name for the job
  • --project=<txt> : Specify a name of a project the job belongs to
  • -d, --directory=<dir>: Specify the directory where to launch the command (default is current directory)
  • --notify=<txt>: Specify a notification method (mail or command to execute). Ex: --notify "mail:name@domain.com" or --notify "exec:/path/to/script args"
  • -O --stdout=<file>: Specify the file that will store the standard output stream of the job. (the %jobid% pattern is automatically replaced)
  • -E --stderr=<file>: Specify the file that will store the standard error stream of the job. (the %jobid% pattern is automatically replaced) Once a job is launched, you can access to the resources reserved throught the oarsh command. Connections through ssh are prohibited.

Request (hierarchical) resources with oarsub

By default, if you execute oarsub without default parameters, you will request 1 computing core for 2 hours.

In order to request a specific amount of resources, you should use the -l option of oarsub and use a hierarchical reservation (characterized with the / separator). For instance, to reserve 1 core on 8 nodes for 4h, you can use: oarsub -l nodes=8/core=1,walltime=4:00:00

Other examples are following and probably self-explainatory:

 # reserve 4 cores belonging to the same CPU (total: 4 cores)
 $> oarsub l cpu=1/core=4 ...

 # 2 cores on 3 nodes (same enclosure) for 3h15: (total: 6 cores)
 $> oarsub -I -l /enclosure=1/nodes=3/core=2,walltime=3:15

 # 4 cores on a GPU node for 8 hours (Total: 4 cores)
 $> oarsub -l /core=4,walltime=8 -p "gpu='YES'"

 # 2 nodes among the h-cluster1-* nodes (Chaos only) (total: 24 cores)
 $> oarsub -l nodes=2 -p "nodeclass='h'" ...

 # 4 cores on 2 GPU nodes + 20 cores on other nodes (total: 28 cores)
 $> oarsub -I -l "{gpu='YES'}/nodes=2/core=4+{gpu='NO'}/core=20"

Reservation of resources at a given time

You can use the -r option of oarsub to specify the date you wish the reservation to be issued. The date format to pass to the -r option is: AAAA-MM-DD HH:MM:SS

For instance, the following command reserve 2 cores on 4 nodes (‘‘i.e’’ 8 cores) to launch the script myscript.sh at 23h30:

[16:55:06] hcartiaux@access(chaos-cluster) ~$> oarsub -l nodes=4/core=2 -r "2012-09-24 23:30:00" ./myscript.sh
[ADMISSION RULE] hcartiaux is granted the privilege to do unlimited reservations
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=147550
Reservation mode : waiting validation...
Reservation valid --> OK

Select nodes precisely with properties

You should use in this case what is called OAR properties with the -p option. The general syntax for this option is as follows: oarsub -p "< property >='< value >'"

You can combine different properties logically (with AND/OR etc). Ex: oarsub -p "nodeclass='h' OR nodeclass='d'"

If you want to use a GPU node, use this command: oarsub -I -p "gpu='YES'"

If you want to use nodes from the bigmem class, try the following: oarsub -I -t bigmem Likewise, for bigsmp class, try the following: oarsub -I -t bigsmp ; Please see below for details.

Global properties

Global Property Description Example
host Full hostname of the resource -p “host=’h-cluster1-14.chaos-cluster.uni.lux’”
network_address Short hostname of the resource -p “network_address=’h-cluster1-14’”
disktype Type of disk (sas/sata/raid/ssd) -p “disktype=’sas’”
memnode / mem RAM size available per node -p “memnode=’24’”
memcpu RAM size available per core -p “memcpu=’12’”
memcore RAM size available per core -p “memcore=’2’”
cpucore Number of cores per CPU -p “cpucore=’6’”
cpufreq Frequency of the processor -p “cpufreq=’2.26’”
enclosure enclosure ID (same IB+Ethernet switch) -p “enclosure=’1’”
nodemodel Node model name -p “nodemodel=’Bull_B500’”
gpu GPU availability -p “gpu=’YES’”
gputype GPU card model (M2070, M2090) -p “gputype=’M2090’”
gpuecc GPU ECC feature (YES, NO) -p “gpuecc=’YES’”
os Operating System of the host -p “os=’debian7’”

Chaos and Gaia properties

Chaos is heterogeneous, therefore, we provide properties in order to permit the reservation of a homogeneous subset of nodes. Gaia is homogeneous, at least for the default job submissions.

Here is a summary of the most useful properties (you can see them on Monika for Chaos and Gaia):

property Description Example
nodeclass Class of node i.e. sub-cluster considered -p “nodeclass=’h’”
room Location of the node (server room), AS28 or CS43 -p “nodeclass=’AS28’”

Connecting to the reserved nodes

Assuming you have a job running and therefore a set of resources reserved for you on the cluster, you can connect to the first reserved nodes using

  oarsub -C <JOB_ID>

Then you can connect to the other reserved nodes using oarsh. Example:

 [10:53][user@access:~]: oarsub -C 2802044
 Connect to OAR job 2802044 via the node gaia-48
 [10:53][user@gaia-48:~]: cat $OAR_NODEFILE
 gaia-48
 gaia-48
 gaia-48
 gaia-48
 gaia-48
 gaia-48
 gaia-48
 gaia-48
 gaia-67
 gaia-67
 [10:53][user@gaia-48:~]: oarsh gaia-67
 Warning: Permanently added '[gaia-67]:6667,[10.226.1.67]:6667' (RSA) to the list of known hosts.
 Last login: Mon Feb 24 13:41:44 2014 from access.gaia-cluster.uni.lux
 [10:53][user@gaia-67:~]: logout
 Connection to gaia-67 closed.

Select bigsmp and bigmem nodes

Some nodes are very specific (the nodes with >= 1TB of memory and the BCS computing node of Gaia with 160 cores in ccNUMA architecture), and can only be reserved with an explicit oarsub parameter: -t bigmem for -t bigsmp:

Cluster Type Node # cores Memory Oarsub example
chaos bigmem r-cluster1-1 32 1024GB oarsub -I -t bigmem
gaia bigsmp+bigmem gaia-73 160 1024GB oarsub -I -t bigsmp -p “network_address=’gaia-73’”
gaia bigmem gaia-74 32 1024GB oarsub -I -t bigmem --project project_biocore -p “network_address=’gaia-74’”
gaia bigsmp+bigmem gaia-80 120 3072GB oarsub -I -t bigsmp --project project_rues
gaia bigsmp+bigmem gaia-81 160 4096GB oarsub -I -t bigsmp --project project_sgi -p “os=’rhel6’”
gaia bigmem gaia-183 64 2048GB oarsub -I -t bigmem --project project_biocore -p “network_address=’gaia-183’”
gaia bigmem gaia-184 64 2048GB oarsub -I -t bigmem -p “network_address=’gaia-184’”

Please, only use these facilities if your jobs strictly require them, otherwise queueing is increased.

Additionally, it is preferable to reserve the complete node with the parameter -l nodes=1, and adapt your workflow consequently in order to make profit of their full potential (exception: bigmem/bigsmp class).

The -\\\-project parameter is required to access some of these special computing systems as they are dedicated to a specific group that you must be a part of.

Select moonshot nodes (on Gaia)

Since 2015, the Gaia cluster includes HP Moonshot nodes that feature energy efficient, low power Xeon CPUs. As these nodes have a specific configuration (4 cores/node, 10GbE networking and no Infiniband), they can only be reserved with an explicit oarsub parameter: -t moonshot:

Cluster Type Node # cores Memory Oarsub example
gaia moonshot moonshot1-[1-45] 180 1440GB oarsub -I -t moonshot
gaia moonshot moonshot2-[1-45] 180 1440GB oarsub -I -t moonshot

Container

With OAR, it is possible to execute jobs within another one. This functionality is called “container jobs”.

First, a job of type container must be submitted, for example:

hcartiaux@access(gaia-cluster) ~$> oarsub -I -t container -l nodes=3,walltime=2:10:00
OAR_JOB_ID=723303
Interactive mode : waiting...
Starting...
Connect to OAR job 723303 via the node gaia-12

Then it is possible to use the inner type to schedule the new jobs within the previously created container job:

hcartiaux@access(gaia-cluster) ~$> oarsub -I -t inner=723303 -l core=16
OAR_JOB_ID=723557
Interactive mode : waiting...
Starting...
Connect to OAR job 723557 via the node gaia-11

Note that an inner job can not be a reservation (ie. it cannot overlap the container reservation).

‘besteffort’ versus ‘default’

By default, your jobs end in the default queue meaning they have all equivalent priority. You can also decide to create so called best-effort jobs which are scheduled in the besteffort queue. Their particularity is that they are deleted if another not besteffort job wants resources where they are running.

Here is an example of a simple oarsub command, which submits a besteffort job: oarsub -t besteffort /path/to/prog

For example you can use this feature to maximize the use of your cluster with multiparametric jobs. When you submit a job you have to use -t besteffort option of oarsub to specify that this is a besteffort job. You have interest in using best-effort jobs in the sense that their associated constraint (wall-time and maximum number of active jobs per user) are more relax than regular jobs. They are summarized below.

Job Type Max Walltime (hour) Max #active_jobs Max #active_jobs_per_user
default 120:00:00 30000 50
besteffort 9000:00:00 10000 1000

Important : a besteffort job cannot be a reservation.

If your job is of the type besteffort and idempotent (oarsub “-t” option) and killed by the OAR scheduler, then another job is automatically created and scheduled with same configuration. Additionally, your job is also resubmitted if the exit code of your program is 99. This is extremely useful facility for jobs that can be restarted and provides certain advantages for some workflows.

Consequently, bestefforts jobs allow you to cut your computation in small slots and exceed the policy restrictions for default jobs, without annoying the workflows of the other users. Idempotent jobs will be resubmitted indefinitely until their completion.

This workflow assumes that you implement the needed changes in your program, or launcher scripts, and that you tolerate a loss of cpu time in some cases.

Here is an example of a oarsub command, which submits a besteffort / idempotent job: oarsub -t besteffort -t idempotent /path/to/prog

Note: If you are a member of the besteffortusers group on the cluster, then ALL your jobs will be by default of type besteffort and you will be notified by OAR as follows:

yourlogin@access ~> oarsub [...]
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
[ADMISSION RULE] !!!! WARNING                                          !!!
[ADMISSION RULE] !!!! AS AN EXTERNAL USER, YOU HAVE BEEN AUTOMATICALLY !!!
[ADMISSION RULE] !!!! REDIRECTED TO THE BEST-EFFORT QUEUE              !!!
[ADMISSION RULE] !!!! YOUR JOB MAYBE KILLED WITHOUT NOTICE             !!!

Scheduling priority (karma)

The OAR batch scheduler uses internally a karma value in order to determine the priority of the user jobs.

Assuming that:

  • user_requested: number of (cores x hours) requested by the user over the last 30 days
  • all_requested: number of (cores x hours) requested by all the users over the last 30 days
  • user_used: number of (cores x hours) used by the user over the last 30 days
  • all_used: number of (cores x hours) used by all the users over the last 30 days

Karma = 2 x user_used / all_used + user_requested / all_requested

The requested values correspond to the walltimes the user specifies on the oarsub command line.
The used values correspond to the actual timespan of the job, i.e. end_time - start_time.
If a job uses the full walltime, then used is the same as asked, otherwise used < asked.

Important When scheduling the jobs OAR will favour jobs with a lower karma.

Practically, this means that:

  • low usage in the last 30 days => low karma => more priority
  • high usage in the last 30 days => high karma => less priority
  • if a user asks for a walltime much longer than he/she actually uses, his/her karma will be higher than that of a different user with the same usage but who has correctly set the job specification

Checkpointing

Definition from wikipedia: “Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure.”

Checkpointing your job allows to enable the following features:

  • The job can be stopped/restarted at will
  • The job can survive scheduled or unscheduled downtimes
  • The job can overcome queue time-limits (eg. 10 or 2 or 1 days, that gets fully irrelevant!). eg. 500h jobs? no problem!
  • The job minimizes its waiting time in the queue since it asks for less resources (in mutiple batches, sure). Finally, if you have jobs that get killed due to reaching walltime limits -which you can’t forecast in advance- you can overcome that problem too, in the most elegant way.

In fact, if your jobs run for more than 1 day, the “social” way to do HPC involves checkpoint; we understand that users often run code developed by third-parties so they can’t do much about it, but then again, did you ask the software developers about the feature? Kindly do so at the first opportunity, to increase the quality of your work.

OAR integration

The workflow described above is implementable by combining several OAR features:

  • besteffort jobs, described in the previous section ;
  • idempotent: if your processus returns an exit code equal to 99, your job will be resubmitted with the same parameters ;
  • checkpoint parameter: enable the checkpointing mechanism, specifies the time in seconds before sending a signal to the first processus of the job ;
  • signal parameter: specify which signal to use when checkpointing (default is SIGUSR2).

Example

Example: oarsub --checkpoint 600 --signal 12 -t besteffort -t idempotent /path/to/prog

So, this job will send a signal SIGUSR2 (12), 600 seconds before the walltime ends. Then if the program returns the exit code 99, it will be resubmitted. Note that if OAR kills a best-effort job in order to schedule a default job, no signal will be sent.

Your program, which will probably be a launcher, can trap the checkpointing signal, and implement a “checkpoint - restart” feature in a few lines of code. You can read these examples of launchers written in bash (you will probably have to adapt them to your case):

In these two examples, the oar parameters are given in the header of the script, you can submit them directly with the -S parameter.

If you are unfamiliar with the signal mechanisms of Unix, this could be an easy start: wikipedia

Statistics with oarstat

  • You can visualize all the submitted jobs with the command oarstat.

    [17:06:10] hcartiaux@access(gaia-cluster) ~$> oarstat
    Job id     Name           User           Submission Date     S Queue
    ---------- -------------- -------------- ------------------- - ----------
    600321     node maintenan fgeorgatos     2012-09-19 13:36:53 R default
    715116     P50_cont_0     sdorosz        2012-09-24 08:22:22 R default
    715117     P50_cont_1     sdorosz        2012-09-24 08:22:23 R default
    715118     P50_cont_2     sdorosz        2012-09-24 08:22:23 R default
    715119     P50_cont_3     sdorosz        2012-09-24 08:22:23 R default
    ...
    
  • View the details with the -f parameter : oarstat -f
  • Select a specific job with the -j parameter, followed by its job ID.

    [17:11:22] hcartiaux@access(gaia-cluster) ~$> oarstat -f -j 600321
    Job_Id: 600321
        job_array_id = 600321
        job_array_index = 1
        name = node maintenance
        project = default
        owner = fgeorgatos
        state = Running
        wanted_resources = -l "{type = 'default'}/host=1/core=12,walltime=168:0:0"
        assigned_resources = 397+398+399+400+401+402+403+404+405+406+407+408
        assigned_hostnames = gaia-34
        queue = default
        command = /bin/sleep 600000
    ...
    
  • View the status of a job with the -s parameter:

    oarstat -s -j 600321
    600321: Running
    
  • View all jobs submitted by a user with -u parameter:

    [17:13:35] hcartiaux@access(gaia-cluster) ~$> oarstat -u fgeorgatos
    Job id     Name           User           Submission Date     S Queue
    ---------- -------------- -------------- ------------------- - ----------
    600321     node maintenan fgeorgatos     2012-09-19 13:36:53 R default
    

Visualization tools for cluster activity

OAR comes with two monitoring tools, each of them installed on the cluster front-end:

  • Monika is a web interface which monitors batch scheduler reservations. It tries to display a very synthetic view of the current cluster state with all active and waiting jobs.

  • Draw OAR gantt creates a Gantt chart which shows job repartition on nodes in the time. It is very useful to see cluster occupation in the past and to know when a job will be launched in the future.

Typical example of job submission

## Default Interactive job oarsub -I By default, 1 core is reserved and the default walltime is 2h (maximum walltime is set to 12 hours for interactive jobs). Each job receive an id (stored in $OAR_JOB_ID on the first reserved node).

[14:47:26] svarrette@access ~> oarsub -I
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
OAR_JOB_ID=76715
Interactive mode : waiting...
Starting...

Connect to OAR job 76715 via the node d-cluster1-9
Use of d-cluster1-9 :
 14:49:24 up 10 days, 22:27,  1 user,  load average: 3.00, 3.00, 3.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
oar      pts/1    cluster1a.chaos. 14:49    0.00s  0.00s  0.00s sshd: oar [priv]

TTY = /dev/pts/1 , TERM = xterm-color, no DISPLAY
[14:49:24] svarrette@d-cluster1-9 ~> echo $OAR_NODEFILE
/var/lib/oar/76715
[14:51:03] svarrette@d-cluster1-9 ~> cat $OAR_NODEFILE
d-cluster1-9.chaos.lu
[14:51:10] svarrette@d-cluster1-9 ~> echo $OAR_JOB_ID
76715

Of course, you can specialize the walltime of your reservation (ex:8h), the number of cores/nodes etc. Ex: oarsub -I -l core=2,walltime=8

Use this type of submission when you want to compile and/or check a given aspect of your program/script etc.: From the front-end, you can check the current jobs associated to your login by issuing: oarstat [-f] -u yourlogin

You can also connect to one of the reserved node using the oarsh utility as follows:

OAR_JOB_ID=< jobid > oarsh < nodename >

Ex: OAR_JOB_ID=76715 oarsh d-cluster1-9

Any other attempt to connect will fail (using ssh or oarsh without job ID):

[15:20:00] svarrette@access ~> ssh d-cluster1-9
********************************************************
 /!\\ WARNING:  Direct login by ssh is forbidden.

 Use oarsub(1) to reserve nodes, and oarsh(1) to connect to your reserved nodes,
 typically by:
  OAR_JOB_ID=<jobid> oarsh <nodename>

 User doc: https://hpc.uni.lu/tiki-index.php?page=User+Documentation
********************************************************
Connection closed by 192.168.200.59
[15:20:05] svarrette@access ~> oarsh d-cluster1-9
oarsh: Cannot connect. Please set either a job id or a job key in your
oarsh: environment using the OAR_JOB_ID or the OAR_JOB_KEY_FILE variable.
[15:24:00] svarrette@access ~>

Once you’ve finished, just execute ‘CTRL+D’ or ‘logout’ to leave the reservation.

Default Passive job oarsub scriptname

Once you ensure your program etc. is working correctly in interactive mode, it is time to list the commands you want to do in a script which name is given to oarsub. This script is to be executed on the first reserved node once the resources are attributed. It means also that this script has access to the OAR environment variables ( $OAR_NODEFILE etc.). You will probably end up in one of the following cases:

  • you want to execute an instance of the same sequential program myprog on the allocated resources. Typically, each execution receive a different parameter and you’ll gain to benefit from the many cores available.
  • you want to run a truly parallel program written with a parallel library such as OpenMP/MPI/CUDA/OpenCL.

We have set a GitHub repository (ULHPC) to give you templates for launcher script you can inspire as they are meant to cover the main workflow met until now on the cluster. It also offers you the possibility to you to contribute to the repository by proposing your own launcher script etc.

For debugging reasons, you are requested to ALWAYS try your scripts in interactive mode prior to their invocation in passive mode. At the end of the run, two files are created in the current directory:

  • OAR.%jobid%.stdout for standard output produced during script execution
  • OAR.%jobid%.stderr for error output produced during script execution You can change the name of those files through options -E and -O. Another interresting option is --notify which helps to notify the end of the script (typically by mail).

Note: the job will end (or be killed) whenever one of the following event happens first:

  • the script execution ends (successfully or otherwise);
  • the walltime expires.

IMPORTANT: to prevent filling the storage space with unnecessary files, always remember to clean, i.e remove, the OAR log files as soon as possible.

Submission constraints / limitations

Depending on the type of job, you will face the following constraints:

Job Type Max Walltime (hour) Max #active_jobs Max #active_jobs_per_user
interactive 12:00:00 10000 5
default 120:00:00 (5 days) 30000 50
besteffort 9000:00:00 (375 days) 10000 1000

If you (really) need to run jobs that require more than 3 days of computations,

  • Ask yourself if you really exploit all the parallel resources offered to you (i.e. see if GNU parallel can help you to speedup your computation)
  • try to use besteffort jobs
  • retry to use besteffort jobs
  • really try to use besteffort jobs ;)
  • consider buying a dedicated hardware
  • we can define in a very few (and well justified) cases, dedicated projects that have individual (and independent) constraints. More precisely, for each project name, a new OAR property for_name: YES/NO is created, together with an LDAP group project_name. This property is set to YES on dedicated resources such that users members of the project_name group are granted the use of "oarsub --project project_name" syntax to create jobs limited to the constraints of the project.

OAR API

The OAR REST API allows to interact with OAR over HTTP using a REST library. Most of the operations usually done with the oar Unix commands may be done using this API from your favourite language.

The OAR REST API is installed on the cluster front-ends (access-chaos.uni.lu and access-gaia.uni.lu) and is available at this URL: https://localhost/oarapi/

For more information, refer to the official documentation.

  • Get the information corresponding to a job id

    14:37:41 hcartiaux@access(chaos-cluster) ~ $ curl -k "https://localhost/oarapi/jobs/1174702.yaml"
    ---
    api_timestamp: 1421148323
    array_id: 1174702
    array_index: 1
    command: ''
    cpuset_name: hcartiaux_1174702
    ...
    start_time: 1421148317
    state: Terminated
    stderr_file: OAR.1174702.stderr
    stdout_file: OAR.1174702.stdout
    stop_time: 1421148321
    submission_time: 1421148315
    type: INTERACTIVE
    types:
      - interactive
    walltime: 7200
    wanted_resources: "-l \"{type = 'default'}/core=1,walltime=2:0:0\" "
    
  • Get the list of nodes used by a job

    14:37:41 hcartiaux@access(chaos-cluster) ~ $ curl -k "https://localhost/oarapi/jobs/1174702/nodes.yaml"
    ---
    api_timestamp: 1421156272
    items:
      - api_timestamp: 1421156272
        links:
          - href: /oarapi/resources/nodes/e-cluster1-13
            rel: self
        network_address: e-cluster1-13
        status: assigned
    links:
      - href: /oarapi/jobs/1174702/nodes.yaml
        rel: self
    offset: 0
    total: 1
    
  • List the existing resources

    curl -k 'https://localhost/oarapi/resources.yaml?structure=simple'
    
    14:38:37 hcartiaux@access(chaos-cluster) ~ $ curl -k 'https://localhost/oarapi/resources.yaml?structure=simple' | head
    ---
    api_timestamp: 1421156321
    items:
      - api_timestamp: 1421156321
        available_upto: 0
        id: 1
        links:
          - href: /oarapi/resources/nodes/k-cluster1-1
            rel: member
            title: node
    ...
    
  • Submit a job

    14:38:42 hcartiaux@access(chaos-cluster) ~ $ curl -k -X POST https://localhost/oarapi/jobs.yaml -d 'resources=core=1&command=sleep 60&name=Test'
    ---
    api_timestamp: 1421156412
    cmd_output: |
      [ADMISSION RULE] Set default walltime to 7200.
      [ADMISSION RULE] Modify resource description with type constraints
      OAR_JOB_ID=1174742
    id: 1174742
    links:
      - href: /oarapi/jobs/1174742
        rel: self
    
  • Send the checkpoint signal to a running job

    15:12:51 hcartiaux@access(chaos-cluster) ~ $ curl -k -X POST https://localhost/oarapi/jobs/1174756/checkpoints/new.yaml
    ---
    api_timestamp: 1421158380
    cmd_output: |
      Checkpointing the job 1174756 ...DONE.
      The job 1174756 was notified to checkpoint itself on e-cluster1-13.
    id: 1174756
    links:
      - href: /oarapi/jobs/1174756
        rel: self
    status: Checkpoint request registered
    
  • Delete a job

    curl -k -X POST https://localhost/oarapi/jobs/1174754/deletions/new.yaml
    ---
    api_timestamp: 1421157682
    cmd_output: |
      Deleting the job = 1174754 ...REGISTERED.
      The job(s) [ 1174754 ] will be deleted in a near future.
    id: 1174754
    links:
      - href: /oarapi/jobs/1174754
        rel: self
    status: Delete request registered
    
  • Example using ruby and rest-client (we submit a simple job and we load its detailed information into a Hash)

    16:42:55 hcartiaux@access(chaos-cluster) ~ $ restclient https://localhost/oarapi
    irb(main):001:0> require 'pp'
    => true
    irb(main):002:0> result=YAML.load(post('jobs.yaml', {:command => "sleep 60"}))
    => {"cmd_output"=>"[ADMISSION RULE] Set default walltime to 7200. ...
    irb(main):003:0> pp(result)
    {"cmd_output"=>
      "[ADMISSION RULE] Set default walltime to 7200.\n[ADMISSION RULE] Modify resource description with type constraints\nOAR_JOB_ID=1174761\n",
     "id"=>1174761,
     "api_timestamp"=>1421163831,
     "links"=>[{"href"=>"/oarapi/jobs/1174761", "rel"=>"self"}]}
    => nil
    irb(main):004:0> result=YAML.load(get('jobs/1174761.yaml'))
    => {"types"=>[], "start_time"=>1421163832, ...
    irb(main):005:0> pp(result)
    {"types"=>[],
     "start_time"=>1421163832,
     "properties"=>"(bigmem='NO' AND bigsmp='NO') AND dedicated='NO'",
     "scheduled_start"=>nil,
     "dependencies"=>[],
     "resubmit_job_id"=>0,
     "reservation"=>"None",
     "exit_code"=>0,
     "command"=>"sleep 60",
     "stop_time"=>1421163893,
     "owner"=>"hcartiaux",
     .........
     "type"=>"PASSIVE",
     "stdout_file"=>"OAR.1174761.stdout",
     "array_id"=>1174761}
    => nil
    irb(main):006:0>
    

Troubleshooting

Manage your environment / load modules