Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

EOY HPC Operations (Q4 2020)

The Covid-19 pandemic affected our daily lives in multiple ways. With regards the ULHPC facility, the main visible impact for you was certainly the postponed installation of the new cluster aion (initially planned for April and currently scheduled for November). Outside the increased load on the iris cluster up to its capacity limits that may have impacted (negatively) your computing experience with increased job slowdown and wait time, you probably noticed that several maintenance operations involving external partners were rendered more complex due to the safety regulations. This also affected the preparation work performed in the premises of the Centre de Calcul to host the new system.

This difficult period is hopefully coming to an end, and we have now a better visibility on the last actions required to finalize the ambitious milestones set for this year.

Indeed, the installation of the new cluster and the changes in the European HPC Ecosystem and in the ULHPC team at the beginning of the year was the occasion of a global reflection with regards the ULHPC policy and configuration. It was conducted by the ULHPC team in close collaboration with the rectorate and the central administration. Here are a summary of the incoming operations and changes planned by the end of the year as a result of this introspection:

Slurm ScheduMD

Implementation of the updated Slurm configuration (2.0)

This operation is planned during the incoming short iris maintenance, scheduled on Thursday, Oct. 22, 2020

This new configuration will drastically change the QOS configuration in favor of a more simpler and consistent setup based on priority. The updated fairsharing algorithm, configuration and limits will allow for a fairer and transparent allocation policy, compliant with EuroHPC and PRACE recommendations and the new cost model and ULHPC resource allocations policy for research projects and external partners (approved by the rectorate on July 7, 2020).

Important In particular, ALL qos-* QOS will be removed.
As this will affect all your launchers, we will allow for a transition period where some of the current QOS (i.e. qos-batch*) will remain operational. This transition period will last until the release of the Aion cluster for general availability.
EasyBuild

Implementation of the new User Software build policy (RESIF3)

We have reworked our internal framework (called RESIF) designed to generate the ULHPC User Software environment. Indeed, RESIF 2 (in place since 2017) led to a complex workflow while exhibiting a divergence and broken compliance from streamline easybuild-easyconfigs developments. The explosion of custom easyconfigs (often hardly justified) and the absence of backward contributions to Easybuild (EB) community made it necessary to completely redesign our framework.

Initiated in May, RESIF3 was completed and successfully validated to generate the full 2019b software set using a novel collaborative workflow. It will become the default software environment on iris on Monday, Nov. 30, 2020. The 2020a release planned for both aion and iris is nearly completed on both Intel and AMD architectures and will be deployed together with the new cluster.

New ULHPC Websites

See blog entry: the ULHPC websites have been reworked to offer both a completely up-to-date content as well as a simplified (and nicer) web experience based on the latest web technologies. They will be released by the end of the month.

aion supercomputer

Aion installation

The installation of the aion cluster which will be, of course, the most critical and complex operation scheduled for November. This operation will be performed in 3 phases:

  1. Installation and configuration of the new aion supercomputer, composed by 318 compute nodes hosted within a compute cell made of 4 BullSequana XH2000 adjacent racks, which will be installed in a specialized server room designed for hosting compute equipment supporting Direct Liquid Cooling (DLC) through a separate high temperature water circuit, thus guaranteeing unprecedented energy efficiency and equipment density.
  2. Adaptation and extension of the existing High-Performance Storage systems
  3. Adaptation of the network (Ethernet and IB), allowing to integrate of the new cluster within the existing Ethernet/Infiniband-based data and management networks, which involves the extension and consolidation of the actual Ethernet and Infiniband topology.
Unfortunately, these operations will impose a LONG downtime period for the Iris cluster in November due to the central components (network, power and cooling) affected across the CDC during and after the installation. Between 2 and 3 weeks of downtime are anticipated, and we are waiting for the latest updates on the COVID-19 crisis mitigation policy in place within our external support teams (i.e. Atos and DDN) to share the final dates set for this long maintenance.

ULHPC School 2020

You will be happy to know that the ULHPC School 2020 is definitively scheduled on Dec. 15-16, 2020. It will be performed in an hybrid teaching mode and we will soon open the registration portal with more details together with the definitive program.

Migration to Service Now

As announced, the migration of the HPC support tickets to Service Now, is effective since Monday, Oct 5, 2020. We thus kindly ask you to use https://hpc.uni.lu/support to quickly access the HPC catalog on the Service Now portal, and thus to stop using the ULHPC Tracker portal which will be decommissioned after 8 years of service.

Newcomers

Finally, you may have followed several changes in the compositon of HPC team since the beginning of the year. With regards the recent arrivals, we are happy to welcome:

  • Mr Teddy Valette and Mr Abatcha Olloh, who joined us as Infrastructure and HPC Architect Engineers in September;
  • Mrs Arlyne Moinier-Vandeventer, which will start as Project Manager to coordinate for the University the EuroHPC Competence Center (EuroCC) on October 15, 2020.
  • Dr. Loizos Koutsantonis, which will also start soon as PostDoctoral researcher for the EuroCC project. Note that the second Postdoc position tied to EuroCC is fullfilled and the selected candidate will be announced in due time later this year.

While on HR aspects, be aware that we have currently 3 open Postdoc positions in HPC/HPDA - kindly forward the information to your network if you are aware of potentially interested candidates.