Special hardware (GPUs, binfs) available & how to use it

March 07, 2021

TOP500 List November 2020

Rank Nation Machine Performance Accelerators
1. Fugaku 442 PFLOPs/s
2. Summit 149 PFLOPs/s NVIDIA V100
3. Sierra 95 PFLOPs/s NVIDIA V100
4. Sunway TaihuLight 93 PFLOPs/s
5. Selene 64 PFLOPs/s NVIDIA A100
6. Tianhe-2A 62 PFLOPs/s
7. Juwels Booster 44 PFLOPs/s NVIDIA A100
8. HPC5 36 PFLOPs/s NVIDIA V100
9. Frontera 24 PFLOPs/s NVIDIA RTX5000/V100
10. Dammam-7 21 PFLOPs/s NVIDIA V100

Components on VSC-3

Model #cores Clock Freq (GHz) Memory (GB) Bandwidth (GB/s) TDP (Watt) FP32/FP64 (GFLOPs/s)
36+50x GeForce GTX-1080 n37[1,2,3]-[001-004,001-022,001-028]
2560 1.61 8 320 180 8228/257
4x Tesla k20m n372-02[4,5]
2496 0.71 5 208 195 3520/1175
1x Tesla V100 n372-023]
5120/644 1.31 32 900 250 14000/7000

Working on GPU nodes

Interactive mode

1. VSC-3 >  salloc -N 1 -p gpu_gtx1080single --qos gpu_gtx1080single 

2. VSC-3 >  squeue -u $USER

3. VSC-3 >  srun -n 1 hostname  (...while still on the login node !)

4. VSC-3 >  ssh n372-012  (...or whatever else node had been assigned)

5. VSC-3 >  module load cuda/9.1.85    
            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
            nvcc ./matrixMul.cu  
            ./a.out 

            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
            nvcc matrixMulCUBLAS.cu -lcublas
            ./a.out

6. VSC-3 >  nvidia-smi

7. VSC-3 >  /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery

Working on GPU nodes cont.

SLURM submission gpu_test.scrpt

#!/bin/bash
#
#  usage: sbatch ./gpu_test.scrpt          
#
#SBATCH -J gtx1080     
#SBATCH -N 1
#SBATCH --partition gpu_gtx1080single         
#SBATCH --qos gpu_gtx1080single

module purge
module load cuda/9.1.85

nvidia-smi
/opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery      

Exercise/Example/Problem:
Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?

Working on binf nodes

Interactive mode

1. VSC-3 >  salloc -N 1 -p binf --qos normal_binf -C binf -L intel@vsc
            (... add   --nodelist binf-13   to request a specific node) 

2. VSC-3 >  squeue -u $USER

3. VSC-3 >  srun -n 4 hostname   (... while still on the login node !)

4. VSC-3 >  ssh binf-11  (... or whatever else node had been assigned)

5. VSC-3 >  module purge

6. VSC-3 >  module load intel/17 
            cd examples/09_special_hardware/binf
            icc -xHost -qopenmp sample.c
            export OMP_NUM_THREADS=8
            ./a.out

Working on binf nodes cont.

SLURM submission slrm.sbmt.scrpt

#!/bin/bash
#
#  usage: sbatch ./slrm.sbmt.scrpt          
#
#SBATCH -J gmxbinfs    
#SBATCH -N 2
#SBATCH --partition binf        
#SBATCH --qos normal_binf         
#SBATCH -C binf        
#SBATCH --ntasks-per-node 24
#SBATCH --ntasks-per-core 1

module purge
module load intel/17  intel-mkl/2017  intel-mpi/2017  gromacs/5.1.4_binf

export I_MPI_PIN=1
export I_MPI_PIN_PROCESSOR_LIST=0-23
export I_MPI_FABRICS=shm:tmi          
export I_MPI_TMI_PROVIDER=psm2        
export OMP_NUM_THREADS=1      
export MDRUN_ARGS=" -dd 0 0 0 -rdd 0 -rcon 0 -dlb yes -dds 0.8  -tunepme -v -nsteps 10000 " 

mpirun -np $SLURM_NTASKS gmx_mpi mdrun ${MDRUN_ARGS}  -s hSERT_5HT_PROD.0.tpr  -deffnm hSERT_5HT_PROD.0  -px hSERT_5HT_PROD.0_px.xvg  -pf hSERT_5HT_PROD.0_pf.xvg  -swap hSERT_5HT_PROD.0.xvg

Real-World Example, AMBER-16

Performance Power Efficiency

Back to Agenda

AGENDA – VSC-Intro