The Chaos Cluster
Overview
The cluster is organized as follows (click to enlarge):
It is composed of the following computing elements:
Thus, the computing nodes of this cluster are quite heterogeneous yet they share the same processor architecture (Intel 64 bit) meaning that a code compiled on one of the nodes could work on all the others, unless it uses special features such as AVX commandset etc.
The previous generations of processors (e.g. Westmere) only support 4 ops/cycle
Below you’ll see a picture of one of the racks hosting chaos cluster components.
Interconnect
The interconnect is composed of an Infiniband QDR (40Gb/s) network.
The choice of topology is imposed by the heterogeneous nature of Chaos, and by the fact that the hardware is split across 2 server rooms.
The following schema describes the topology of the Chaos Infiniband Network.
Additionally, the cluster is connected to the infrastructure of the University using 10Gb Ethernet.
A third 1Gb Ethernet network is also used on the cluster, mainly for services and administration purposes.
Storage / Cluster File System
In terms of storage, a dedicated NFS server is responsible for sharing specific folders (most importantly, users home directories) across the nodes of the clusters.
The hardware part is composed of a Netapp E5400 disk enclosure, containing 60 disks (3TB SAS 7.2krpm). The raw capacity is 180 TB, and is split in 5 x raid 6 of 10 disks (8+2), 10 other disks are used as spare.
An additional storage device (of the same capacity) is used as backup target. The filesystem is XFS over LVM (Logical Volume Manager)
History
The Chaos cluster exists since 2007 to serve the computing needs of the University of Luxembourg.
The platform has evolved since 2007 as follows:
-
2007: Initialization of the cluster composed by 1 frontend, 1 NFS server (net capacity: 3TB) and 18 computing nodes, divided in two classes:
- k-cluster1-[1-16]: Dell PE850 (1U) (1 Pentium D @ 3.2 GHz, 4GB RAM). Total: 32 computing cores, 410 GFlops
- b-cluster1-[1-2]: Dell PE6850 (4U) (4 Dual Core Xeon @ 3.4 GHz, 32 GB RAM). Total: 16 computing cores, 218 GFlops
-
2010: Extension with 1 HP blade enclosure (10U);
- h-cluster1-[1-32]: HP Proliant BL2x220c G6 (2 Xeon Westmere L5640 @ 2.26 GHz, 24GB RAM) for a total of 384 cores (RPeak = 3,472 TFlops)
-
2011: Storage and computing capacity extension
- Increased storage capacity with an upgrade of the disks in the storage bay. Total Capacity of 21.83 TB.
- d-cluster1-[1-16]: Dell M610 (2 Xeon Westmere L5640 @ 2.26 GHz, 24GB RAM) for a total of 176 cores (RPeak = 1,736 TFlops)
-
2012: Storage, computing capacity and interconnect extension
- Increased storage capacity with a new disk enclosure and a new NFS server. Total Capacity raised now to 110 TB.
- e-cluster1-[1-16]: Dell M620 (2 Xeon Sandy-Bridge E5-2660 @ 2.20GHz, 32GB RAM) for a total of 256 cores
- s-cluster1-[1-16]: HP SL230S (2 Xeon Sandy-Bridge E5-2660 @ 2.20GHz, 32GB RAM) for a total of 256 cores
- Fast infiniband QDR interconnect (Mellanox based)
- Old ‘k’ and ‘b’ class nodes decommissioned
-
2014: Memory upgrade from 24GB to 48GB on d-cluster1-[1-16]