Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

Your usage of the UL HPC platform will involve running applications on the compute nodes. Please refers to Getting started for a more detailed guide.

This guide assumes that you:

  • have an account with an SSH public key (you must connect with public key authentication). Go to Get an account page to request an account.
  • have configured an SSH client on your machine. Go to Access to HPC cluster to know how to do this.

Access to HPC platform

We will access iris-cluster. You can replace iris with gaia or chaos as the cluster to connect to.

Linux/BSD/ macOS

  • Open your terminal application
  • At the prompt, type:
1
ssh -p 8022 yourlogin@access-iris.uni.lu
  • You should be connected to iris

Windows

We will show you an example using PuttY. Please adapt it to your needs.

  • Open your SSH client application.
  • Enter this settings:
    • In Category:Session :
      • Host Name: access-iris.uni.lu
      • Port: 8022
      • Connection Type: SSH (leave as default)
    • In Category:Connection:Data :
      • Auto-login username: yourlogin
  • Click on Open button
  • You should be connected to iris-cluster.

Results

At the end of this step, the following welcome banner should appears:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
==================================================================================
 Welcome to access2.iris-cluster.uni.lux
==================================================================================
                          _                         ____
                         / \   ___ ___ ___  ___ ___|___ \
                        / _ \ / __/ __/ _ \/ __/ __| __) |
                       / ___ \ (_| (_|  __/\__ \__ \/ __/
                      /_/   \_\___\___\___||___/___/_____|
               _____      _        ____ _           _          __
              / /_ _|_ __(_)___   / ___| |_   _ ___| |_ ___ _ _\ \
             | | | || '__| / __| | |   | | | | / __| __/ _ \ '__| |
             | | | || |  | \__ \ | |___| | |_| \__ \ ||  __/ |  | |
             | ||___|_|  |_|___/  \____|_|\__,_|___/\__\___|_|  | |
              \_\                                              /_/
==================================================================================

=== Computing Nodes ========================================= #RAM/n === #Cores ==
 iris-[001-108] 108 Dell C6320 (2 Xeon E5-2680v4@2.4GHz [14c/120W]) 128GB  3024
 iris-[109-168]  60 Dell C6420 (2 Xeon Gold 6132@2.6GHz [14c/140W]) 128GB  1680
 iris-[169-186]  18 Dell C4140 (2 Xeon Gold 6132@2.6GHz [14c/140W]) 768GB   504
                +72 GPU  (4 Tesla V100 [5120c CUDA + 640c Tensor])   32GB +368640
 iris-[187-190]   4 Dell R840 (4 Xeon Platin.8180M@2.5GHz [28c/205W]) 3TB   448
==================================================================================
  *** TOTAL: 190 nodes, 5656 cores + 368640 CUDA cores + 46080 Tensor cores ***

 Fast interconnect using InfiniBand EDR 100 Gb/s technology
 Shared Storage (raw capacity): 2284 TB (GPFS) + 1300 TB (Lustre) = 3584 TB

 Support (in this order!)                       Platform notifications
   - User DOC ........ http://hpc.uni.lu/docs    - hpc-platform@uni.lu
   - FAQ ............. http://hpc.uni.lu/faq     - Twitter: @ULHPC
   - Mailing-list .... hpc-users@uni.lu
   - Bug reports ..... hpc-tracker.uni.lu
   - Admins .......... hpc-sysadmins@uni.lu

/!\ #961 - Perf. validation of 6 new GPU nodes performed over night /!\

==================================================================================
 /!\ NEVER COMPILE OR RUN YOUR PROGRAMS FROM THIS FRONTEND !
     First reserve your nodes (using srun/sbatch(1))
Linux access2.iris-cluster.uni.lux 3.10.0-693.21.1.el7.x86_64 x86_64
 16:25:06 up 60 days, 23:05, 55 users,  load average: 0.15, 0.22, 0.23
0 16:25:06 yourlogin@access2(iris-cluster) ~ $
If you have troubles accessing the cluster, please have a look at the detailed guide here

Reserve a core

Kindly note that you should never run compute/memory/disk intensive applications on a cluster’s frontend.

On access-iris reserve a 1 core with Slurm.

1
2
3
4
5
0 16:27:19 yourlogin@access2(iris-cluster) ~ $ si
[SLURM] SLURM_JOB_ID=407753
[SLURM] Your nodes are:
      iris-001*1
0 16:27:26 hcartiaux@iris-001(iris-cluster)[SLURM407753->59] ~ $

We are now connected on a compute node and your tasks are restricted to executing on the reserved core.

If you have trouble reserving a node, please have a look at the detailed scheduler guide here.

Use module to load application and its dependencies

Have a look to the application we provide. Load the appropriate module (for example Matlab):

1
2
3
4
5
0 16:27:26 yourlogin@iris-001(iris-cluster)[SLURM407753->59] ~ $ module load base/MATLAB
0 16:27:56 yourlogin@iris-001(iris-cluster)[SLURM407753->59] ~ $ module list

Currently Loaded Modules:
  1) base/MATLAB/2018a
If you have troubles using module, please have a look at the detailed guide here

Run your application

Run MATLAB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0 16:27:58 yourlogin@iris-001(iris-cluster)[SLURM407753->59] ~ $ matlab -nodisplay
Opening log file:  /home/users/hcartiaux/java.log.30062

                                               < M A T L A B (R) >
                                     Copyright 1984-2018 The MathWorks, Inc.
                                      R2018a (9.4.0.813654) 64-bit (glnxa64)
                                                February 23, 2018


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>> 1+1

ans =

     2

>>
If you have troubles running your application or you want to run your own application, please have a look at the detailled guide compiling guide or installing a software guide.

Exit your application and job

Exit MATLAB (exit command) and exit your shell with Ctrl-<D>.

Congratulations, you have run your first job on Iris cluster !

What’s next ?

Tutorials

We are continuously enhancing a set of tutorials (sources). Give them a try!

Getting started

Have a look at the Getting started page to know how to use the UL HPC platform.

TOP10 Best Practices

Using scalable HPC systems in harmony (for Uni.Lu and not only):

Fundamentals

  • Be a nice HPC-citizen: respect the defined Acceptable Use Policy & do report identified and reproducible issues via the ticketing system, at the earliest convenience.
  • Read documentation thoroughly and first try to verify the known path; reuse existing (and tested) launch-scripts mechanisms for job submission in the queueing system.
  • Read about and apply standard HPC techniques & practices, as visible in the training material of HPC sites, eg. of NCSA CyberIntegrator (at least, check the content index - it will certainly become useful in the future)
  • Ensure proper disk sizing/backup/redundancy level for your application situation; declare a project if your needs are special and require some kind of attention or, special allocation. Allocation is always conditional on resources availability and may imply for you some costs handling, if your needs are too special.
  • Consider sysadmin time planning: realize that all incoming issues have to be prioritized according to user community impact. Use ticketing.

Nice to have

  • Reuse existing optimized libraries and applications wherever possible (fi. modules: MPI, compilers, libraries)
  • Make your scripts generic (respect any defined Directory Structure and apply staging techniques, where needed); Use variable aliasing - no hardcoding of full path names; remember that any HPC system may be modified, upgraded or simply replaced before your project finishes.
  • Take advantage of modules, to manage multiple versions of software, even for own usage.
  • Take advantage of EasyBuild, to manage organizing software from multiple sources; either for own software or 3rd-party. This is especially important with code expected to run across multiple architectures and rebuilt in multiple contexts.
  • Identify the policy class your tasks belong to and try to make the most efficient work out of your allocation; avoid underutilization of an allocation, this will harm other users because it increases queueing; monitor your jobs via ganglia plots for both chaos & gaia.

Hints & Tips

Make your life easier

  • Do code versioning for the sources or scripts you develop (ref: github/gforge); fi. do you have a history of all last month’s revisions? What happens if you inadvertently overwrite a 20KB source file right before a paper submission deadline?
  • Do some form of checkpointing if your individual jobs run for more than 1 day; the advantages you get out of it are plenty and it is a major aspect of code quality; see checkpointing info online and remember that OAR can send a signal to checkpoint your job before it arrives to walltime termination.
  • Keep a standard eg. “Hello World” example ready, in case you need to do differential debugging on a suspected system problem. Use it as a reference in your ticket, if you spot problems with it; it helps communication to remain relevant and effective. More generally, when you report a bug of a complex software tree, reduce it to the essential.
  • Avoid looking for hacks to overcome existing policies; rather document your need and the rational behind it and propose it as a “project”; it makes more sense for everybody, really
  • Take advantage of GPU technology or other architectures if applicable in your case; be careful with the GPU vs cores speedup ratios (it is always welcome to receive such user reports and you are encouraged to share the results in hpc-users list, even if they are not favourable)
  • If you have a massive workflow of jobs to manage, do not reinvent the wheel: contact the sysadmins and other fellow users (hpc-users list) to poll for advice on your approach & collect ideas
  • Report any plans to scale within HPC systems in any non-trivial way, as early as possible; it helps both sides to prepare nicely and avoids frustration
  • Unless you have own reasons, opt for a scripting language for your code integration but, faster optimized language for the “application kernel” (in order to obtain both of maintainability & performance!). Many computational kernels are readily usable from within scripting languages (examples: NumPy, Scipy).
  • If you have deadlines to adhere to, kindly notify about it early on; you may not be alone; the sysadmins team serve in best effort yet will try to keep user needs satisfied, as possible, with the proviso that not all requests may be able to fulfill.
  • If you find techniques that you consider elegant and relevant to other users’ work, you are automatically welcome to report to HPC users’ mailing list!

Report a problem