The Iris Cluster
The cluster is organized as follows (click to enlarge):
The cluster is composed of the following computing elements:
- Skylake processors (
iris-[109-190]nodes) carry on 32 DP ops/cycle and support the new AVX-512 instruction set.
- Broadwell processors (
iris-[1-108]nodes) carry on 16 DP ops/cycle and support AVX2/FMA3.
The flagship Iris cluster features the following GPU-AI accelerators:
|Node||Model||#nodes||#GPUs||CUDA Cores||Tensor Cores||RPeak DP||RPeak Deep Learning (FP16)|
|iris-[169-186]||NVIDIA Tesla V100 SXM2||18||4/node||5120/GPU||640/GPU||561.6TF||9000 TFlops|
The following schema describes the topology of the Iris Infiniband EDR (100Gb/s) Network.
Additionally, the cluster is connected to the infrastructure of the University using 2x40Gb Ethernet links and to the internet using 2x10Gb Ethernet links.
A third 1Gb Ethernet network is also used on the cluster, mainly for services and administration purposes.
Performances of the network have been measured using MVAPICH OSU Micro-Benchmarks . The results are presented below.
Storage / Cluster File System
The cluster relies on 3 types of Distributed/Parallel File Systems to deliver high-performant Data storage at a BigData scale (i.e TB).
|FileSystem||Usage||#encl||#disk||Raw Capacity [TB]||Max I/O Bandwidth|
|SpectrumScale (GPFS)||Home||5||390||Read: 10 GiB/s / Write: 10 GiB/s|
|Lustre||Scratch||4||186||Read: 10 GiB/s / Write: 10 GiB/s|
- GPFS: 1612 TB
- Lustre: 919 TB
- Isilon: 3188 TB
In terms of storage, a dedicated SpectrumScale (GPFS) system is responsible for sharing specific folders (most importantly, users home directories) across the nodes of the clusters.
A DDN GridScaler solution hosts the SpectrumScale Filesystem and is composed of a GS7K base enclosure (running the GPFS NSDs) and 4 SS8460 expansion enclosures, containing a total of 390 disks (380x 6TB SED + 10x SSD). The raw capacity is 2284TB, and is split in 37 x raid 6 of 10 disks (8+2).
For high speed, temporary I/O, a dedicated Lustre system is currently holding per-user directories.
A DDN ExaScaler solution hosts the Lustre Filesystem and is composed of two SS7700 base enclosures, each with 2x SS8460 expansions and an internal Infiniband fabric linking the block storage to dedicated, redundant MDS (metadata) and OSS (object storage) servers. The complete solution contains a total of 186 disks (167x 8TB SED + 19x SSD). The raw capacity is 1300TB, and is split in 16 x raid 6 of 10 disks (8+2).
Isilon / OneFS
In 2014, the SIU, the UL HPC and the LCSB join their forces (and their funding) to acquire a scalable and modular NAS solution able to sustain the need for an internal big data storage, i.e. provides space for centralized data and backups of all devices used by the UL staff and all research-related data, including the one proceed on the UL HPC platform.
At the end of a public call for tender released in 2014, the EMC Isilon system was finally selected with an effective deployment in 2015. It is physically hosted in the new CDC (Centre de Calcul) server room in the Maison du Savoir. Composed by 29 enclosures featuring the OneFS File System, it currently offers an effective capacity of 3.1 PB.
All the nodes provide SSD disks, therefore, you can write in
/tmp and get
very honest performance in term of I/Os and throughput.
The Iris cluster exists since the beginning of 2017 as the most powerful computing platform available within the University of Luxembourg.
March 2017: Initialization of the cluster composed of:
- iris-[1-100], Dell PowerEdge C6320, 100 nodes, 2800 cores, 12.8 TB RAM
- 10/40GB Ethernet network, high-speed Infiniband EDR 100Gb/s interconnect
- SpectrumScale (GPFS) core storage, 1.44 PB
- Redundant / load-balanced services with:
- 2x adminfront servers (cluster management)
- 2x access servers (user frontend)