Installation instructions for Vlasiator on EuroHPC systems - fmihpc/vlasiator GitHub Wiki
Compilation instructions for various EuroHPC systems as of 05/2025
LUMI-G
Use pure login modules as base (do not purge or add modules to .bashrc)
module load LUMI/24.03
module load partition/G
module load cpeAMD
module load rocm/6.2.2
module load Boost/1.83.0-cpeAMD-24.03
module load papi/7.1.0.1
export PATH=$PATH:/appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/bin/
./build_libraries.sh lumi_hipcc
export VLASIATOR_ARCH=lumi_hipcc
See LUMI-G launch instructions at https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/ for job scripts
LUMI-C
Use pure login modules as base (do not purge or add modules to .bashrc).
module load LUMI/24.03
module load Boost/1.83.0-cpeGNU-24.03
module load partition/C
module load papi/7.1.0.1
./build_libraries.sh lumi_2403
export VLASIATOR_ARCH=lumi_2403
use srun
to launch
BSC MARE NOSTRUM 5 GPP (CPU)
export VLASIATOR_ARCH=MN5_gpp
BSC MARE NOSTRUM 5 ACC (GPU)
Not yet implemented
CINECA LEONARDO BOOSTER
module load gcc/12.2.0 openmpi/4.1.6--gcc--12.2.0 nvhpc/23.11 cuda/12.1
./build_libraries.sh leonardo_booster
export VLASIATOR_ARCH=leonardo_booster
use srun
to launch
Good placement of threads. (?)
CINECA LEONARDO DCGP GCC
module load gcc/12.2.0 openmpi/4.1.6--gcc--12.2.0
./build_libraries.sh leonardo_dcgp
export VLASIATOR_ARCH=leonardo_dcgp
use mpirun
to launch
Bad placement of threads.
CINECA LEONARDO DCGP with INTEL ONEAPI COMPILERS
Not yet operational, fails to link
module load intel-oneapi-compilers
module load intel-oneapi-mpi
./build_libraries.sh leonardo_dcgp_intel
export VLASIATOR_ARCH=leonardo_dcgp_intel
KAROLINA GPU
Use the it4ifree
command to find project information
Karolina GPU has 2 x 64 cores per node, 8x GPU accelerator NVIDIA A100 per node, 320GB HBM2 memory per node
module load OpenMPI/4.1.6-GCC-12.2.0-CUDA-12.4.0
module load PAPI/7.0.1-GCCcore-12.2.0
./build_libraries.sh karolina_cuda
export VLASIATOR_ARCH=karolina_cuda
Run with:
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node:PE=$OMP_NUM_THREADS --bind-to core --report-bindings $EXE $CFG
Good placement of threads.
KAROLINA GCC
Karolina CPU has 2 x 64 cores per node.
module load GCC/13.3.0
module load PAPI/7.1.0-GCCcore-13.3.0
module load OpenMPI/5.0.3-GCC-13.3.0
./build_libraries.sh karolina_gcc
export VLASIATOR_ARCH=karolina_gcc
Run with
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node:PE=$OMP_NUM_THREADS --bind-to core --report-bindings $EXE $CFG
or
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun --cpu-bind=cores --cpus-per-task=$OMP_NUM_THREADS
Good placement of threads.
DISCOVERER
Not yet implemented
IZUM VEGA
Not yet implemented
DEUCALION
Not yet implemented
MELUXINA
Not yet implemented
JUPITER
Not yet implemented
Verifying thread placement
To fully exploit the available performance of clusters, it is imperative to ensure correct placement of OpenMP threads on cores with correct NUMA affinities. For GPU use, threads must also be placed on cores which have good affinity with the available GPU device. Below are listed some tools which can be used to verify good thread placement:
srun --jobid=# --overlap --pty /usr/bin/bash (or just htop)
rocm-smi --showtoponuma (for AMD GPUs)
nvidia-smi topo -m (for NVIDIA GPUs)
lscpu | grep NUMA
srun --mpi=pmix /appl/bin/hostinfo
export CRAY_OMP_CHECK_AFFINITY=TRUE
module load xthi (if available)
srun --mpi=pmix -c $t -n 1 xthi