HPC @ Uni.lu

High Performance Computing in Luxembourg

The following GPGPU accelerators are currently available on the Gaia cluster:

Node Model #nodes #GPU board #GPU Cores RPeak
gaia-[61-62] NVidia Tesla M2070 2 2 / node 448c / board = 1792c 2.06 TFlops
gaia-[63-72] NVidia Tesla M2090 10 2 / node 512c / board = 10240c 13.3 TFlops
gaia-[75-79] NVidia Tesla K40m 5 2 / node 2880c/ board = 28800c 14.3 TFlops
gaia-[179-182] NVidia Tesla K80 4 4 / node 4992c/ board = 79872c 46.56 TFlops
  Total: 21 50 120704 76.22 TFlops

Features of Tesla-class GPUs

  • GPU powered by the massively parallel CUDA architecture. Delivers cluster performance at 1/20th the power and 1/10th the cost of CPU-only systems based on the latest quad core CPUs.
  • IEEE 754 single and double Precision floating point units.
  • ECC Support
  • System Monitoring Features
  • Designed for Maximum Reliability – passive heatsink design eliminates moving parts and cables.
  • 6 or 12GB of GDDR5 memory per GPU
  • NVIDIA Parallel DataCache – accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand.
  • NVIDIA GigaThread Engine – maximizes the throughput by faster context switching, concurrent kernel execution, and improved thread block scheduling.
  • Asynchronous Transfer – turbocharges system performance by executing data transfers, even when the computing cores are busy.

Accelerators specifications

Characteristics Tesla M2070 Tesla M2090 Tesla K20m Tesla K40m Tesla K80
Chip Tesla T20 GPU Tesla T20A GPU Tesla K20m GPU Tesla K40m GPU Tesla K80 GPU
Compute capability (version) 2.0 2.0 3.5 3.5 3.7
Processor clock 1.15 GHz 1.3 GHz 706 MHz 745 MHz (boost to 875 MHz) 560 MHz (boost to 875 MHz)
Number of processing cores 448c 512c 2496c 2880c 4992c (2 x 2496c)
Memory clock 1.566 GHz 1.85 GHz 2.6 GHz 3 GHz 2.5 GHz
Memory I/O 384-bit GDDR5 384-bit GDDR5 320-bit GDDR5 384-bit GDDR5 384-bit GDDR5
Memory bandwidth (ECC off) 150 GBytes/sec 177 GBytes/sec 208 GB/sec 288 GB/sec 480 GB/sec
Double Precision Floating Point Performance 515 Gflops 665 GFlops 1.17 Tflops 1.43 Tflops 2.91 Tflops
Single Precision Floating Point Performance 1.03 Tflops 1.331 GFlops 3.52 Tflops 4.29 Tflops 8.74 Tflops
Total Dedicated Memory 6GB GDDR5 6GB GDDR5 5 GB 12 GB 24 GB (2 x 12GB)
Max Power Consumption 225 W 225 W 225 W 235 W 300 W
Thermal cooling solution Passive heat sink Passive heat sink Passive heat sink Passive heat sink Passive heat sink
System Interface PCIe x16 Gen2 PCIe x16 Gen2 PCIe x16 Gen2 PCIe x16 Gen3 PCIe x16 Gen3
Software Development Tools -> -> -> -> CUDA C/C++/Fortran
          OpenCL, DirectCompute
          Parallel Nsight, ArrayFire