HPC @ Uni.lu

High Performance Computing in Luxembourg

Merry Christmas & Happy NY 2019

2018 was a surprising year in many aspects and is now gently coming to its end. On behalf of the full Uni.lu HPC Team, we wish you a Merry Christmas and a Happy New Year 2019!

You will find below an overview of the last year activities, together with a couple of announcement that will impact your computing exercise starting 2019.

Uni.lu HPC Message of the Year TL;DR

  • With the new GPU and large-memory nodes added to iris (see next point), the total capacity of the Uni.lu HPC facility is as follows:
    • Computing peak capacity: 1037.8 TFlops (11060 CPU computing cores)
    • Shared storage capacity: 9866.4 TB
  • … in particular, the new GPU and large-memory computing nodes currently added to the iris cluster (see #923), the Peta-flops computing capacity milestones is reached
  • As announced during the last Uni.lu HPC School:
    • the chaos and gaia clusters will be DECOMMISSIONNED mid-2019
      • you will be given migration guidelines through a separate channel
      • prepare (if not yet done) to transition to iris!!!

2018 Activity Report

Since this year has been quite productive and fruitful, it is worth to come back quickly to the main achievements.

  • The new iris cluster officially celebrated its first official anniversary in June 2018.
    • As of Dec. 2018, a total of 228,266 jobs have been scheduled on this sole cluster, corresponding to 22,168,992 CPU hours allocated (only for iris)
      • see sreport cluster -t hours Utilization Start=2017-06-01
    • the cluster has been be moved from its temporary hosting datacenter CDC S-01 to the HPC dedicated CDC S-02 in Belval in February 2018.
  • As a reminder, the computing part of the iris cluster was built in a stepwise approach as follows:
    • RFP 160020 (attributed to Post/Dell in Nov 2016): 168 nodes, 4707 cores, 256 TFlops
      • (2017) 108 x Dell C6320, 128 GB RAM, 2 Intel Xeon E5-2680v4@2.4 GHz [2x14c]
      • (Q1 2018) 60 x Dell C6320, 128 GB RAM, 2 Intel Xeon Gold 6132@2.6 GHz [2x14c]
    • RFP 180027 (attributed to DimensionData/Dell in Oct. 2018): 22 GPU/Bigmem nodes, 952 cores, +77.77 TFlops
      • (Q4 2018) 18 GPU nodes Dell C4140, 768 GB RAM, 2 Intel Xeon Gold 6132@2.6 GHz [2x14c], x 4 Nvidia V100 SXM2 32GB, +561.6 GPU TFlops
      • (Q4 2018) 4 x Dell R840, 3072 TB RAM, 4 Intel Xeon Platinum 8180M@2.5 GHz [4x28c]
  • The storage part of the iris cluster (Self Encrypting Disks (SED)-based storage) was also built in a stepwise approach as follows:
    • RFP 160019 (attributed to Telindus/HPE/DDN on Nov 2016): SpectrumsScale GPFS (DDN GridScaler), 2284 TB
      • Mounted in /mnt/irisgpfs
      • Initial deployment June 2017, capacity extension performed in April 2018
      • Performance validation (IOR)
        • read: 11.25 GB/sec, write: 9.46 GB/sec
    • RFP 170035 (attributed to Fujitsu/DDN in Dec.2017): Lustre (DDN Exascaler), 1280 TB
      • Mounted in /mnt/lscratch
      • Initial deployment April 2018
      • Performance issue identified
      • Lustre Exascaler 4.0 upgrade Oct 2018 (see #905), last HW upgrade/final validation done in Nov 2018
      • Final performance evaluation (IOR)
        • before intervention: 12.6GB/s read, 9.94GB/s write
        • after intervention: 14.8GB/s read, 17.72GB/s write
  • A new RESIF/2018 Software Set update was released in Aug. 2018
    • Reference List: https://hpc.uni.lu/users/software/
    • currently includes more than 206 applications and libraries, in particular:
    • Machine Learning: PyTorch, TensorFlow, Keras, Apache Spark
    • Math & Optimization: MATLAB, Mathematica, CPLEX
    • Physics & Chemistry: GROMACS, ESPResSo, QuantumESPRESSO, Meep, ABINIT, NAMD, NWChem, VASP, CRYSTAL
    • Bioinformatics: SAMtools, BEDTools, BWA, BioPerl, FastQC, PLINK, SNPTEST, FASTX-Toolkit, TopHat, Bowtie2, Trinity, BLAST+, ABySS, mpiBLAST, HTSlib
    • Computer Aided Design & Engineering, CFD: ANSYS, OpenFOAM
    • Container systems: Singularity
  • Beyond the iris setup, we have continued to improve our internal workflows to prepare larger-scale deployments planned for 2019:
    • Consolidated integration of the SLURM batch scheduler (instead of OAR)
    • Consolidation of the High Availability (HA) setup
    • Improved system automation (Puppet / hiera), backup and monitoring
    • Continuous OS / software modules / security upgrade
      • Migration to Debian 8 on gaia and chaos
      • RESIF v2, updated software sets (2018a toolchain \& co., see above)
      • Meltdown/Spectre processor vulnerability mitigation
  • Again, the gaia and chaos clusters will be DECOMMISSIONNED mid-2019

Outside the consolidation of the facility, this year has seen several significant events:

  • 2 new successful editions of the Uni.lu HPC School were organized in June 12-13th, 2018 and Nov. 23th, 2018.
  • The Uni.lu HPC Team participated to several reference HPC conferences and exhibitions:
  • We have been given several invited keynotes over the year:
  • We have continued the consolidation of the national and european HPC initiatives
    • National HPC-Big Data Competence Center, in collaboration with the Ministry of Economy
    • NVIDIA Joint AI Lab, in collaboration with the Ministry of State
    • EuroHPC
    • PRACE - Partnership for Advanced Computing in Europe
      • Luxembourg Delegate: Prof. Pascal Bouvry, Luxembourg Advisor: Dr. Sebastien Varrette
    • ETP4HPC - European Technology Platform (ETP) for HPC
      • Luxembourg/UL representatives: Prof. Pascal Bouvry, Dr. Sebastien Varrette and Valentin Plugaru
      • See Strategic Research Agenda

Next actions

  • Performance validation of the new GPU and large-memory nodes will be performed in January
  • We are working hard with the Ministry of Economy and ministry of State to concretise the creation of a National HPC-Big Data Competence center and the establishment of a NVIDA Joint AI Lab, coupled with the deployment of a Peta-scale class facility. We will keep you informed over 2019 of the advances of these exciting and challenging projects.
  • The next edition of the UL HPC School is scheduled for 2 days in June, 2019 – precise dates to be given in due time.
  • The Biocore team (and in particular Sarah Peter) has invested a significant time in analysing its GDPR compliance and Data Protection Impact Assessment (DPIA). We are working all together on extending this toward the Uni.lu HPC, you will know more of the outcomes of this analysis in 2019.
  • We were working for 3y to define a new official structure enabling Research computing @ UL & abroad. Hopefully this effort will succeed in 2019 thanks to the support of the Rectorate, and of our new Vice-rector for Research.

Thanks for your patience in reading this email. We wish you again a Merry Christmas and a Happy Computing within the new year 2019.

The University of Luxembourg HPC Team i.e.

  • Prof. Pascal Bouvry, Senior advisor for the president as regards the HPC strategy, Leader of PCO Group, Head Uni.lu HPC
  • Dr. Sebastien Varrette, Research Scientist, Deputy head Uni.lu HPC
  • Valentin Plugaru, R&D Specialist, Senior HPC Architect
  • Sarah Peter, R&D Specialist, HPC/LCSB Support Liaison
  • Hyacinthe Cartiaux, HPC System administrator, and
  • Clément Parisot, HPC System administrator