VSC – supercomputers

March 8, 2021

VSC – supercomputers


OUTLINE:

VSC – Vienna Scientific Cluster


VSC links: Information provided:
\(~~\)https://vsc.ac.at VSC homepage (general info)
\(~~\)https://service.vsc.ac.at VSC service website (application)
\(~~\)https://wiki.vsc.ac.at VSC user documentation
\(~~\) VSC user support \(~\)&\(~\) contact

Supercomputers for beginners

TOP500 GREEN500 (#1 TOP500)
VSC-1 (2009) 35 TFlop/s 156 (11/2009) 94 (06/2009) 1.8 PFlop/s #1 (11/2009)
VSC-2 (2011) 135 TFlop/s 56 (06/2011) 71 (06/2011) 8 PFlop/s #1 (06/2011)
VSC-3 (2014) 596 TFlop/s 85 (11/2014) 86 (11/2014) 33 PFlop/s #1 (11/2014)
VSC-3 (………) 596 TFlop/s 461 (11/2017) 175 (11/2017) 93 PFlop/s #1 (11/2017)
VSC-4 (2019) 2.7 PFlop/s 82 (06/2019) ——— 148 PFlop/s #1 (06/2019)
VSC-4 (………) 2.7 PFlop/s 124 (11/2020) ——— 442 PFlop/s #1 (11/2020)

VSC systems – what do they look like ?


VSC-4 – components of a supercomputer


\(~\)

Parallel hardware architectures

shared memory

distributed memory

 
ZZZsocket / CPU

 
ZZZnode

 
ZZZcluster

OpenMP, …

➠ MPI (everywhere)

OpenMP, …

➠ MPI (everywhere)

MPI, …

VSC compute nodes

processing units (PU#) \(~~~\) ➠ pinning
see: talk on SLURM and pinning@Wiki

memory hierarchy (mem_0064 nodes):
L1 data cache: 32 kB, private to core
L2 cache: 256 kB, private to core (unified)
L3 cache: 20 MB, shared by all cores of 1 socket
memory: 32 GB per socket

VSC hardware overview

VSC node-interconnect schematic

INTENT VSC-X

INTENT VSC-X2-level fat-tree – schematic figure

VSC node-interconnect schematic (level – 1)

INTENT VSC-X

INTENT VSC-X2-level fat-tree – compute nodes are attached to the lowest level

VSC node-interconnect schematic (level – 2)

INTENT VSC-X

INTENT VSC-XVSC-4 \(~\)\(~\) single rail Intel Omnipath \(~\)\(~\) 2-level fat-tree (BF = 2:1)

VSC node-interconnect schematic (level – 3)

INTENT VSC-XVSC-3 \(~\)\(~\) dual rail Intel QDR-80 \(~\)\(~\) 3-level fat-tree (BF = 2:1 / 4:1)

INTENT VSC-XVSC-4 \(~\)\(~\) single rail Intel Omnipath \(~\)\(~\) 2-level fat-tree (BF = 2:1)

VSC-3 ping-pong – intra-node vs. inter-node

  • MPI latency & bandwidth:
VSC-3 latency [μs]
intra-socket 0.3 μs
inter-socket 0.7 μs
IB -1- edge 1.4 μs
IB -2- leaf 1.8 μs
IB -3- spine 2.3 μs

  • typical values:
latency bandwidth
1–2 ns L1 cache 100 GB/s
3–10 ns L2/L3 c. 50 GB/s
100 ns memory 10 GB/s
1–10 μs HPC networks per node 2 HCAs 1–8 GB/s


Back to Agenda

AGENDA – VSC-Linux

AGENDA – VSC-Intro