The Covid-19 pandemic affected our daily lives in multiple ways. With regards the ULHPC facility, the main visible impact for you was certainly the postponed installation of the new cluster
aion (initially planned for April and currently scheduled for November). Outside the increased load on the
iris cluster up to its capacity limits that may have impacted (negatively) your computing experience with increased job slowdown and wait time, you probably noticed that several maintenance operations involving external partners were rendered more complex due to the safety regulations. This also affected the preparation work performed in the premises of the Centre de Calcul to host the new system.
Indeed, the installation of the new cluster and the changes in the European HPC Ecosystem and in the ULHPC team at the beginning of the year was the occasion of a global reflection with regards the ULHPC policy and configuration. It was conducted by the ULHPC team in close collaboration with the rectorate and the central administration. Here are a summary of the incoming operations and changes planned by the end of the year as a result of this introspection:
Implementation of the updated Slurm configuration (2.0)
This operation is planned during the incoming short
iris maintenance, scheduled on Thursday, Oct. 22, 2020
This new configuration will drastically change the QOS configuration in favor of a more simpler and consistent setup based on priority. The updated fairsharing algorithm, configuration and limits will allow for a fairer and transparent allocation policy, compliant with EuroHPC and PRACE recommendations and the new cost model and ULHPC resource allocations policy for research projects and external partners (approved by the rectorate on July 7, 2020).
qos-*QOS will be removed.
As this will affect all your launchers, we will allow for a transition period where some of the current QOS (i.e.
qos-batch*) will remain operational. This transition period will last until the release of the Aion cluster for general availability.
Implementation of the new User Software build policy (RESIF3)
We have reworked our internal framework (called RESIF) designed to generate the ULHPC User Software environment. Indeed, RESIF 2 (in place since 2017) led to a complex workflow while exhibiting a divergence and broken compliance from streamline easybuild-easyconfigs developments. The explosion of custom easyconfigs (often hardly justified) and the absence of backward contributions to Easybuild (EB) community made it necessary to completely redesign our framework.
Initiated in May, RESIF3 was completed and successfully validated to generate the full 2019b software set using a novel collaborative workflow.
It will become the default software environment on
iris on Monday, Nov. 30, 2020. The 2020a release planned for both
iris is nearly completed on both Intel and AMD architectures and will be deployed together with the new cluster.
New ULHPC Websites
See blog entry: the ULHPC websites have been reworked to offer both a completely up-to-date content as well as a simplified (and nicer) web experience based on the latest web technologies. They will be released by the end of the month.
The installation of the
aion cluster which will be, of course, the most critical and complex operation scheduled for November.
This operation will be performed in 3 phases:
- Installation and configuration of the new
aionsupercomputer, composed by 318 compute nodes hosted within a compute cell made of 4 BullSequana XH2000 adjacent racks, which will be installed in a specialized server room designed for hosting compute equipment supporting Direct Liquid Cooling (DLC) through a separate high temperature water circuit, thus guaranteeing unprecedented energy efficiency and equipment density.
- Adaptation and extension of the existing High-Performance Storage systems
- Adaptation of the network (Ethernet and IB), allowing to integrate of the new cluster within the existing Ethernet/Infiniband-based data and management networks, which involves the extension and consolidation of the actual Ethernet and Infiniband topology.
ULHPC School 2020
You will be happy to know that the ULHPC School 2020 is definitively scheduled on Dec. 15-16, 2020. It will be performed in an hybrid teaching mode and we will soon open the registration portal with more details together with the definitive program.
Migration to Service Now
As announced, the migration of the HPC support tickets to Service Now, is effective since Monday, Oct 5, 2020. We thus kindly ask you to use https://hpc.uni.lu/support to quickly access the HPC catalog on the Service Now portal, and thus to stop using the ULHPC Tracker portal which will be decommissioned after 8 years of service.
Finally, you may have followed several changes in the compositon of HPC team since the beginning of the year. With regards the recent arrivals, we are happy to welcome:
- Mr Teddy Valette and Mr Abatcha Olloh, who joined us as Infrastructure and HPC Architect Engineers in September;
- Mrs Arlyne Moinier-Vandeventer, which will start as Project Manager to coordinate for the University the EuroHPC Competence Center (EuroCC) on October 15, 2020.
- Dr. Loizos Koutsantonis, which will also start soon as PostDoctoral researcher for the EuroCC project. Note that the second Postdoc position tied to EuroCC is fullfilled and the selected candidate will be announced in due time later this year.
While on HR aspects, be aware that we have currently 3 open Postdoc positions in HPC/HPDA - kindly forward the information to your network if you are aware of potentially interested candidates.