To compensate for the loss of computing capacity induced by the decommissioning of the
chaos clusters, we have issued at the end of the past year a European call for tender (RFP 190027) to acquire a new High Performance Computing (HPC) cluster (name:
aion) relying on highly efficient Direct Liquid Cooling (DLC) equipment to complement and integrate with the existing flagship HPC system
iris in production since June 2017 – see past last EOY newsletter.
The tender was released on TED (under reference TED72/2019-608787) on September 11, 2019. Several offers were received on Oct 29, 2019. Their detailed analysis was conducted by the UL HPC team together with the procurement and legal department of the University, leading to the official attribution of the market to Atos in December, 2019.
In practice, this tender includes 3 lots:
- Lot 1: the new DLC
aionsupercomputer, composed by 318 compute nodes hosted within a compute cell made of 4 BullSequana XH2000 adjacent racks, which will be installed in a specialized server room designed for hosting compute equipment supporting Direct Liquid Cooling (DLC) through a separate high temperature water circuit, thus guaranteeing unprecedented energy efficiency and equipment density.
- Compute: 318 compute nodes, based on 106 BullSequana X2410 blades comprising 3 compute nodes, each featuring 2 AMD Epyc ROME 7H12 processors (64c @ 2.6GHz, TDP:280W)
- Fast Interconnect: HDR Infiniband Fabric in a Fat tree topology (2:1 blocking)
- Associated servers and management stack
- Lot 2: Adaptation and extension of the existing High-Performance Storage systems. In particular, the usable storage capacity of the existing primary high-performance storage solution (SpectrumScale/GPFS filesystem) will be extended by 1720TB/1560TiB to reach a total of 4.41 PB
- Lot 3: Adaptation of the network (Ethernet and IB)
- allowing to integrate of the new cluster within the existing Ethernet-based data and management networks, which involves the extension and consolidation of the actual Ethernet topology.
- includes the adaptation and extension of the existing InfiniBand (IB) topology to allow for bridging the two networks (Iris ‘island’ and Aion ‘island’).
An overview of the
aion compute nodes is depicted in the below table
||2xAMD Epyc 7H12 @ 2.6GHz (2x64c), 256GB RAM||318||40704||81408||1 693 PF|
In our initial kickoff meeting (see January UL newletter), we were planning for a production release of the new cluster in May 2020, allowing to reach for 2020 the compute and storage capacities for the ULHPC facility summarized in the below charts.
Of course, the initial planning of operations has been largely delayed due to the COVID-19 crisis which affected the factory preparation of the HPC component, as well as the capacity to host the external companies supporting the UL in the preparation of the CDC server rooms.
As of now, the tentative planning is set to target a production release of the `aion` cluster in November-December 2020.
We will keep you up-to-date once installation operations will start.
Update (Dec. 2020) After several post-pone linked to the COVID regulation policy, the cluster was finally delivered just before Christmas. Installation operations started at that moment and will continue in January 2021.