This website is deprecated, the old pages are kept online but you should refer in priority to the new web site hpc.uni.lu and the new technical documentation site hpc-docs.uni.lu

The flagship Iris cluster features the following GPU-AI accelerators:

Node	Model	#nodes	#GPUs	CUDA Cores	Tensor Cores	RPeak DP	RPeak Deep Learning (FP16)
iris-[169-186]	NVIDIA Tesla V100 SXM2 16G	18	4/node	5120/GPU	640/GPU	561.6TF	9000 TFlops
iris-[191-196]	NVIDIA Tesla V100 SXM2 32G	6	4/node	5120/GPU	640/GPU	187.2TF	3000 TFlops

Key features of Volta GPUs

New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning
Second-Generation NVLink: 300 GB/s bandwidth
High Bandwidth Memory - HBM2: 900 GB/s peak memory bandwidth

For more details see the Volta announcement article.

The following GPGPU accelerators are currently available on the Gaia cluster:

Node	Model	#nodes	#GPU board	#GPU Cores	RPeak
gaia-[61-62]	NVidia Tesla M2070	2	2 / node	448c / board = 1792c	2.06 TFlops
gaia-[63-72]	NVidia Tesla M2090	10	2 / node	512c / board = 10240c	13.3 TFlops
gaia-[75-79]	NVidia Tesla K40m	5	2 / node	2880c/ board = 28800c	14.3 TFlops
gaia-[179-182]	NVidia Tesla K80	4	4 / node	4992c/ board = 79872c	46.56 TFlops
	Total:	21	50	120704	76.22 TFlops

Features of Tesla-class GPUs on gaia

GPU powered by the massively parallel CUDA architecture. Delivers cluster performance at 1/20th the power and 1/10th the cost of CPU-only systems based on the latest quad core CPUs.
IEEE 754 single and double Precision floating point units.
ECC Support
System Monitoring Features
Designed for Maximum Reliability – passive heatsink design eliminates moving parts and cables.
6 or 12GB of GDDR5 memory per GPU
NVIDIA Parallel DataCache – accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand.
NVIDIA GigaThread Engine – maximizes the throughput by faster context switching, concurrent kernel execution, and improved thread block scheduling.
Asynchronous Transfer – turbocharges system performance by executing data transfers, even when the computing cores are busy.

Accelerators specifications

Characteristics	Tesla M2070	Tesla M2090	Tesla K40m	Tesla K80	Tesla V100
Chip	Tesla T20 GPU	Tesla T20A GPU	Tesla K40m GPU	Tesla K80 GPU	Tesla V100 GPU
Compute capability (version)	2.0	2.0	3.5	3.7	7.0
Processor clock	1.15 GHz	1.3 GHz	745 MHz (boost to 875 MHz)	560 MHz (boost to 875 MHz)	1300MHz (boost to 1530 MHz)
Number of processing cores	448c	512c	2880c	4992c (2 x 2496c)	5120c CUDA + 640c Tensor
Memory I/O	384-bit GDDR5	384-bit GDDR5	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2
Memory bandwidth (ECC off)	150 GBytes/sec	177 GBytes/sec	288 GB/sec	480 GB/sec	900 GB/sec
Double Precision Floating Point Performance	515 Gflops	665 GFlops	1.43 Tflops	2.91 Tflops	7.8 Tflops
Single Precision Floating Point Performance	1.03 Tflops	1.331 GFlops	4.29 Tflops	8.74 Tflops	15.7 TFlops
Total Dedicated Memory	6GB GDDR5	6GB GDDR5	12 GB	24 GB (2 x 12GB)	16 GB
Max Power Consumption	225 W	225 W	235 W	300 W	300 W
Thermal cooling solution	Passive heat sink	Passive heat sink	Passive heat sink	Passive heat sink	Passive heat sink
System Interface	PCIe x16 Gen2	PCIe x16 Gen2	PCIe x16 Gen3	PCIe x16 Gen3	NVlink2
Software Development Tools	->	->	->	CUDA C/C++/Fortran	CUDA C/C++/Fortran
				OpenCL, DirectCompute	OpenCL, DirectCompute
				Parallel Nsight, ArrayFire	OpenACC

HPC @ Uni.lu

Accelerators Available on UL HPC

Accelerators specifications

Recent Posts

GitHub Repos