Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu

The flagship Iris cluster features the following GPU-AI accelerators:

Node Model #nodes #GPUs CUDA Cores Tensor Cores RPeak DP RPeak Deep Learning (FP16)
iris-[169-186] NVIDIA Tesla V100 SXM2 16G 18 4/node 5120/GPU 640/GPU 561.6TF 9000 TFlops
iris-[191-196] NVIDIA Tesla V100 SXM2 32G 6 4/node 5120/GPU 640/GPU 187.2TF 3000 TFlops

Key features of Volta GPUs

  • New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning
  • Second-Generation NVLink: 300 GB/s bandwidth
  • High Bandwidth Memory - HBM2: 900 GB/s peak memory bandwidth

For more details see the Volta announcement article.

The following GPGPU accelerators are currently available on the Gaia cluster:

Node Model #nodes #GPU board #GPU Cores RPeak
gaia-[61-62] NVidia Tesla M2070 2 2 / node 448c / board = 1792c 2.06 TFlops
gaia-[63-72] NVidia Tesla M2090 10 2 / node 512c / board = 10240c 13.3 TFlops
gaia-[75-79] NVidia Tesla K40m 5 2 / node 2880c/ board = 28800c 14.3 TFlops
gaia-[179-182] NVidia Tesla K80 4 4 / node 4992c/ board = 79872c 46.56 TFlops
  Total: 21 50 120704 76.22 TFlops

Features of Tesla-class GPUs on gaia

  • GPU powered by the massively parallel CUDA architecture. Delivers cluster performance at 1/20th the power and 1/10th the cost of CPU-only systems based on the latest quad core CPUs.
  • IEEE 754 single and double Precision floating point units.
  • ECC Support
  • System Monitoring Features
  • Designed for Maximum Reliability – passive heatsink design eliminates moving parts and cables.
  • 6 or 12GB of GDDR5 memory per GPU
  • NVIDIA Parallel DataCache – accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand.
  • NVIDIA GigaThread Engine – maximizes the throughput by faster context switching, concurrent kernel execution, and improved thread block scheduling.
  • Asynchronous Transfer – turbocharges system performance by executing data transfers, even when the computing cores are busy.

Accelerators specifications

Characteristics Tesla M2070 Tesla M2090 Tesla K40m Tesla K80 Tesla V100
Chip Tesla T20 GPU Tesla T20A GPU Tesla K40m GPU Tesla K80 GPU Tesla V100 GPU
Compute capability (version) 2.0 2.0 3.5 3.7 7.0
Processor clock 1.15 GHz 1.3 GHz 745 MHz (boost to 875 MHz) 560 MHz (boost to 875 MHz) 1300MHz (boost to 1530 MHz)
Number of processing cores 448c 512c 2880c 4992c (2 x 2496c) 5120c CUDA + 640c Tensor
Memory I/O 384-bit GDDR5 384-bit GDDR5 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2
Memory bandwidth (ECC off) 150 GBytes/sec 177 GBytes/sec 288 GB/sec 480 GB/sec 900 GB/sec
Double Precision Floating Point Performance 515 Gflops 665 GFlops 1.43 Tflops 2.91 Tflops 7.8 Tflops
Single Precision Floating Point Performance 1.03 Tflops 1.331 GFlops 4.29 Tflops 8.74 Tflops 15.7 TFlops
Total Dedicated Memory 6GB GDDR5 6GB GDDR5 12 GB 24 GB (2 x 12GB) 16 GB
Max Power Consumption 225 W 225 W 235 W 300 W 300 W
Thermal cooling solution Passive heat sink Passive heat sink Passive heat sink Passive heat sink Passive heat sink
System Interface PCIe x16 Gen2 PCIe x16 Gen2 PCIe x16 Gen3 PCIe x16 Gen3 NVlink2
Software Development Tools -> -> -> CUDA C/C++/Fortran CUDA C/C++/Fortran
        OpenCL, DirectCompute OpenCL, DirectCompute
        Parallel Nsight, ArrayFire OpenACC