This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu
Accelerators Available on UL HPC
The flagship Iris cluster features the following GPU-AI accelerators:
Node | Model | #nodes | #GPUs | CUDA Cores | Tensor Cores | RPeak DP | RPeak Deep Learning (FP16) |
---|---|---|---|---|---|---|---|
iris-[169-186] | NVIDIA Tesla V100 SXM2 16G | 18 | 4/node | 5120/GPU | 640/GPU | 561.6TF | 9000 TFlops |
iris-[191-196] | NVIDIA Tesla V100 SXM2 32G | 6 | 4/node | 5120/GPU | 640/GPU | 187.2TF | 3000 TFlops |
Key features of Volta GPUs
- New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning
- Second-Generation NVLink: 300 GB/s bandwidth
- High Bandwidth Memory - HBM2: 900 GB/s peak memory bandwidth
For more details see the Volta announcement article.
The following GPGPU accelerators are currently available on the Gaia cluster:
Node | Model | #nodes | #GPU board | #GPU Cores | RPeak |
---|---|---|---|---|---|
gaia-[61-62] | NVidia Tesla M2070 | 2 | 2 / node | 448c / board = 1792c | 2.06 TFlops |
gaia-[63-72] | NVidia Tesla M2090 | 10 | 2 / node | 512c / board = 10240c | 13.3 TFlops |
gaia-[75-79] | NVidia Tesla K40m | 5 | 2 / node | 2880c/ board = 28800c | 14.3 TFlops |
gaia-[179-182] | NVidia Tesla K80 | 4 | 4 / node | 4992c/ board = 79872c | 46.56 TFlops |
Total: | 21 | 50 | 120704 | 76.22 TFlops |
Features of Tesla-class GPUs on gaia
- GPU powered by the massively parallel CUDA architecture. Delivers cluster performance at 1/20th the power and 1/10th the cost of CPU-only systems based on the latest quad core CPUs.
- IEEE 754 single and double Precision floating point units.
- ECC Support
- System Monitoring Features
- Designed for Maximum Reliability – passive heatsink design eliminates moving parts and cables.
- 6 or 12GB of GDDR5 memory per GPU
- NVIDIA Parallel DataCache – accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand.
- NVIDIA GigaThread Engine – maximizes the throughput by faster context switching, concurrent kernel execution, and improved thread block scheduling.
- Asynchronous Transfer – turbocharges system performance by executing data transfers, even when the computing cores are busy.
Accelerators specifications
Characteristics | Tesla M2070 | Tesla M2090 | Tesla K40m | Tesla K80 | Tesla V100 |
---|---|---|---|---|---|
Chip | Tesla T20 GPU | Tesla T20A GPU | Tesla K40m GPU | Tesla K80 GPU | Tesla V100 GPU |
Compute capability (version) | 2.0 | 2.0 | 3.5 | 3.7 | 7.0 |
Processor clock | 1.15 GHz | 1.3 GHz | 745 MHz (boost to 875 MHz) | 560 MHz (boost to 875 MHz) | 1300MHz (boost to 1530 MHz) |
Number of processing cores | 448c | 512c | 2880c | 4992c (2 x 2496c) | 5120c CUDA + 640c Tensor |
Memory I/O | 384-bit GDDR5 | 384-bit GDDR5 | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 |
Memory bandwidth (ECC off) | 150 GBytes/sec | 177 GBytes/sec | 288 GB/sec | 480 GB/sec | 900 GB/sec |
Double Precision Floating Point Performance | 515 Gflops | 665 GFlops | 1.43 Tflops | 2.91 Tflops | 7.8 Tflops |
Single Precision Floating Point Performance | 1.03 Tflops | 1.331 GFlops | 4.29 Tflops | 8.74 Tflops | 15.7 TFlops |
Total Dedicated Memory | 6GB GDDR5 | 6GB GDDR5 | 12 GB | 24 GB (2 x 12GB) | 16 GB |
Max Power Consumption | 225 W | 225 W | 235 W | 300 W | 300 W |
Thermal cooling solution | Passive heat sink | Passive heat sink | Passive heat sink | Passive heat sink | Passive heat sink |
System Interface | PCIe x16 Gen2 | PCIe x16 Gen2 | PCIe x16 Gen3 | PCIe x16 Gen3 | NVlink2 |
Software Development Tools | -> | -> | -> | CUDA C/C++/Fortran | CUDA C/C++/Fortran |
OpenCL, DirectCompute | OpenCL, DirectCompute | ||||
Parallel Nsight, ArrayFire | OpenACC |