Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu

Overview

The cluster is organized as follows (click to enlarge):

Download the PDF

Overview of the Iris cluster

Computing Capacity

The cluster is composed of the following computing elements:

Important

  • Skylake processors (iris-[109-196] nodes) carry on 32 DP ops/cycle and support the new AVX-512 instruction set.
  • Broadwell processors (iris-[1-108] nodes) carry on 16 DP ops/cycle and support AVX2/FMA3.

GPGPU accelerators

The flagship Iris cluster features the following GPU-AI accelerators:

Node Model #nodes #GPUs CUDA Cores Tensor Cores RPeak DP RPeak Deep Learning (FP16)
iris-[169-186] NVIDIA Tesla V100 SXM2 16G 18 4/node 5120/GPU 640/GPU 561.6TF 9000 TFlops
iris-[191-196] NVIDIA Tesla V100 SXM2 32G 6 4/node 5120/GPU 640/GPU 187.2TF 3000 TFlops

Interconnect

The following schema describes the topology of the Iris Infiniband EDR (100Gb/s) Network.

Additionally, the cluster is connected to the infrastructure of the University using 2x40Gb Ethernet links and to the internet using 2x10Gb Ethernet links.

A third 1Gb Ethernet network is also used on the cluster, mainly for services and administration purposes.

Performances of the network have been measured using MVAPICH OSU Micro-Benchmarks . The results are presented below.

  • [](/images/benchs/benchmark_OSU-iris_latency.pdf)
  • []( /images/benchs/benchmark_OSU-iris_bandwidth.pdf)

Storage / Cluster File System

The cluster relies on 3 types of Distributed/Parallel File Systems to deliver high-performant Data storage at a BigData scale (i.e TB).

FileSystem Usage #encl #disk Raw Capacity [TB] Max I/O Bandwidth
SpectrumScale (GPFS) Home 5 390 Read: 10 GiB/s / Write: 10 GiB/s
Lustre Scratch 4 186 Read: 10 GiB/s / Write: 10 GiB/s
Isilon OneFS Projects 29 1044 n/a
  Total: 38 1620 3936 TB  

The current effective shared storage capacity on the Iris cluster is estimated to 5.6 PB
  • GPFS: 1525 TB
  • Lustre: 919 TB
  • Isilon: 3188 TB

GPFS storage

In terms of storage, a dedicated SpectrumScale (GPFS) system is responsible for sharing specific folders (most importantly, users home directories) across the nodes of the clusters.

A DDN GridScaler solution hosts the SpectrumScale Filesystem and is composed of a GS7K base enclosure (running the GPFS NSDs) and 4 SS8460 expansion enclosures, containing a total of 390 disks (360x 6TB SED + 30x SSD). The raw capacity is 2180TB, and is split in 35 x raid 6 of 10 disks (8+2).

Lustre storage

For high speed, temporary I/O, a dedicated Lustre system is currently holding per-user directories.

A DDN ExaScaler solution hosts the Lustre Filesystem and is composed of two SS7700 base enclosures, each with 2x SS8460 expansions and an internal Infiniband fabric linking the block storage to dedicated, redundant MDS (metadata) and OSS (object storage) servers. The complete solution contains a total of 186 disks (167x 8TB SED + 19x SSD). The raw capacity is 1300TB, and is split in 16 x raid 6 of 10 disks (8+2).

Isilon / OneFS

In 2014, the SIU, the UL HPC and the LCSB join their forces (and their funding) to acquire a scalable and modular NAS solution able to sustain the need for an internal big data storage, i.e. provides space for centralized data and backups of all devices used by the UL staff and all research-related data, including the one proceed on the UL HPC platform.

At the end of a public call for tender released in 2014, the EMC Isilon system was finally selected with an effective deployment in 2015. It is physically hosted in the new CDC (Centre de Calcul) server room in the Maison du Savoir. Composed by 29 enclosures featuring the OneFS File System, it currently offers an effective capacity of 3.1 PB.

Local storage

All the nodes provide SSD disks, therefore, you can write in /tmp and get very honest performance in term of I/Os and throughput.

History

The Iris cluster exists since the beginning of 2017 as the most powerful computing platform available within the University of Luxembourg.

  • March 2017: Initialization of the cluster composed of:

    • iris-[1-100], Dell PowerEdge C6320, 100 nodes, 2800 cores, 12.8 TB RAM
    • 10/40GB Ethernet network, high-speed Infiniband EDR 100Gb/s interconnect
    • SpectrumScale (GPFS) core storage, 1.44 PB
    • Redundant / load-balanced services with:
      • 2x adminfront servers (cluster management)
      • 2x access servers (user frontend)
  • May 2017: 8 new regular nodes added
    • iris-[101-108], Dell PowerEdge C6320, 8 nodes, 224 cores, 1.024 TB RAM
  • Dec. 2017: 60 new regular nodes added yet based on Skylake processors
    • iris-[109-168], Dell PowerEdge C6420, 60 nodes, 1680 cores, 7.68 TB RAM
  • Feb 2018: SpectrumScale (GPFS) extension to reach 2284TB raw capacity
    • new expansion unit and provisioning of enough complementary disks to feed the system.
  • March 2018: complementary Lustre storage, 1300 TB raw capacity

  • Dec 2018: New GPU and Bigmem nodes
    • iris-[169-186]: Dell C4140, 18 GPU nodes x 4 Nvidia V100 SXM2 16GB, part of the gpu partition
    • iris-[187-190]: Dell R840, 4 Bigmem nodes 4x28c i.e. 112 cores per node, part of the bigmem partition
  • May 2019: 6 new GPU nodes
    • iris-[191-196]: Dell C4140, 6 GPU nodes x 4 Nvidia V100 SXM2 32GB, part of the gpu partition
  • Oct 2019: SpectrumScale (GPFS) extension to allow 1Bn files capacity
    • replacement of 2 data pools (HDD-based) with new metadata pools (SSD-based)