Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

This page aims to help our users report their issues efficiently. Going through the following procedures will definitely help to diagnose and resolve the challenge you face.

This page is greatly inspired from the HLRN website

Support organization

The support is provided by the HPC system administrators.

You can mail us at hpc-sysadmins@uni.lu. Kindly, realize that a collective address will give better service over individual mails, as you will likely get an answer faster.

Even better, use the hpc-tracker.uni.lu, otherwise we will have difficulty to keep a proper timeline of your request. We receive tens to hundreds of emails each day, you know better how that works…

First checks

Is your issue already discussed and documented?

Please, read the documentation first. Yes, please do :)

Maintenance

The clusters may be down periodically for software and hardware upgrades. Users are usually notified of the planned maintenance windows or unplanned incident via the hpc-platform mailing list and Twitter.

Scheduling issues

Read the OAR documentation. If your job is not scheduled as you think it should be, please, check drawgantt and monika.

System issues

If you do not understand the behavior of your jobs, try to analyze the information related to the nodes the jobs are running on with ganglia.

Especially, check the memory usage. If your job consumes too much memory, then the Out-Of-Memory Killer mechanism will kill it. This problem is described in the OAR documentation

How to determine and report problems

Problem severity

High severity problems comprise problems which affect and significantly impact all users ability to access the clusters, compile programs, run jobs and analyse data. If it appears that one system component (file system, network, critical software) is malfunctioning and preventing you from doing your work totally, then it is most likely of high severity. Please open an issue on hpc-tracker.uni.lu (preferred) and/or send email to hpc-sysadmins@uni.lu immediately. Be prepared to provide detailed information regarding the problem, as given below.

If the problems you are experiencing are individual in nature, intermittent, unrepeatable, or application specific, then it is most likely not a high severity problem. Please document your problem (see below) and open a ticket on hpc-tracker, so that there is a chance to investigate it, as the information is still relevant (getting a report of an issue many months later, helps nobody).

Guidelines for problem description

You can greatly speed up the process of isolating and correcting errors you encounter when using the cluster by providing complete and detailed information, but no more than that ;-), as explained in the next sections.

General questions

  • Who? - Name and user id (login), eventually project name
  • When? - When did the problem occur?
  • Where? - Which cluster ? Which node ? Which job ?
  • What? - What happened? What exactly were you doing or trying to do ?
  • Which errors or problems occurred? Which symptoms ? Please report system or software messages literally and exactly.
  • Is the problem reproducible ? How ?
  • Were there any recent changes in your environment that could probably trigger the problem?

Special types of problems

  • Access problems:

    • Has a scheduled maintenance been announced? Have you checked the mail received from the hpc-platform mailing list?
    • Which cluster are you trying to access?
    • From which computer did you try to access the cluster ? IP address? Type of operating system? Which SSH client and version? Do you use your work computer or a personal device?
    • From where? Are you within the University of Luxembourg or outside?
  • Compiler or Run-Time Errors, do you think it is related to the local system configuration, or related to a hardware issue ?

    • Yes, open a ticket and describe your issue
    • No, unfortunately, we can not provide strong support at the application / programming level. You are strongly encouraged to use the upstream mailing lists and/or to ask to our community of users at hpc-users@uni.lu
  • I/O & file systems problems:

    • Which file system do you use?
    • Which directory and file(s)?
    • Corrupt files: do not remove or overwrite possibly corrupt files. Move them to another place on the same file system instead (using the mv command). Provide the following detailed information:

      • Exact path for the file
      • Date and time of the last successful access to the file
      • Where was the file created (which node(s))?
      • When was the file created? (Creation date and time of the file)
      • How was the file created (manually, software)?
      • Was the file moved or copied since creation?
      • Was the file modified since creation?
  • Batch system problems (OAR), please, provide:

    • the OAR JOB ID
    • submit, start and finish time
    • the job output or runtime error message
  • HPC@Uni.lu website:

    • URL of the page
    • What’s the problem (error, lack of precision, incomplete information, etc)?
    • Which browser (and version) do you use?