Installation instructions for Vlasiator on EuroHPC systems - fmihpc/vlasiator GitHub Wiki

Compilation instructions for various EuroHPC systems as of 05/2025

LUMI-G

Use pure login modules as base (do not purge or add modules to .bashrc)

module load LUMI/24.03
module load partition/G
module load cpeAMD
module load rocm/6.2.2
module load Boost/1.83.0-cpeAMD-24.03
module load papi/7.1.0.1
export PATH=$PATH:/appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/bin/
./build_libraries.sh lumi_hipcc
export VLASIATOR_ARCH=lumi_hipcc

See LUMI-G launch instructions at https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/ for job scripts

LUMI-C

Use pure login modules as base (do not purge or add modules to .bashrc).

module load LUMI/24.03
module load Boost/1.83.0-cpeGNU-24.03
module load partition/C
module load papi/7.1.0.1
./build_libraries.sh lumi_2403
export VLASIATOR_ARCH=lumi_2403

use srun to launch

BSC MARE NOSTRUM 5 GPP (CPU)

export VLASIATOR_ARCH=MN5_gpp

BSC MARE NOSTRUM 5 ACC (GPU)

Not yet implemented

CINECA LEONARDO BOOSTER

module load gcc/12.2.0 openmpi/4.1.6--gcc--12.2.0 nvhpc/23.11 cuda/12.1
./build_libraries.sh leonardo_booster
export VLASIATOR_ARCH=leonardo_booster

use srun to launch

Good placement of threads. (?)

CINECA LEONARDO DCGP GCC

module load gcc/12.2.0 openmpi/4.1.6--gcc--12.2.0
./build_libraries.sh leonardo_dcgp
export VLASIATOR_ARCH=leonardo_dcgp

use mpirun to launch

Bad placement of threads.

CINECA LEONARDO DCGP with INTEL ONEAPI COMPILERS

Not yet operational, fails to link

module load intel-oneapi-compilers
module load intel-oneapi-mpi
./build_libraries.sh leonardo_dcgp_intel
export VLASIATOR_ARCH=leonardo_dcgp_intel

KAROLINA GPU

Use the it4ifree command to find project information

Karolina GPU has 2 x 64 cores per node, 8x GPU accelerator NVIDIA A100 per node, 320GB HBM2 memory per node

module load OpenMPI/4.1.6-GCC-12.2.0-CUDA-12.4.0
module load PAPI/7.0.1-GCCcore-12.2.0
./build_libraries.sh karolina_cuda
export VLASIATOR_ARCH=karolina_cuda

Run with:

export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node:PE=$OMP_NUM_THREADS --bind-to core --report-bindings $EXE $CFG

Good placement of threads.

KAROLINA GCC

Karolina CPU has 2 x 64 cores per node.

module load GCC/13.3.0
module load PAPI/7.1.0-GCCcore-13.3.0
module load OpenMPI/5.0.3-GCC-13.3.0
./build_libraries.sh karolina_gcc
export VLASIATOR_ARCH=karolina_gcc

Run with

export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node:PE=$OMP_NUM_THREADS --bind-to core --report-bindings $EXE $CFG

or

export OMP_PLACES=cores
export OMP_PROC_BIND=close
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun --cpu-bind=cores --cpus-per-task=$OMP_NUM_THREADS

Good placement of threads.

DISCOVERER

Not yet implemented

IZUM VEGA

Not yet implemented

DEUCALION

Not yet implemented

MELUXINA

Not yet implemented

JUPITER

Not yet implemented

Verifying thread placement

To fully exploit the available performance of clusters, it is imperative to ensure correct placement of OpenMP threads on cores with correct NUMA affinities. For GPU use, threads must also be placed on cores which have good affinity with the available GPU device. Below are listed some tools which can be used to verify good thread placement:

srun --jobid=# --overlap --pty /usr/bin/bash (or just htop)
rocm-smi --showtoponuma (for AMD GPUs)
nvidia-smi topo -m (for NVIDIA GPUs)
lscpu | grep NUMA
srun --mpi=pmix /appl/bin/hostinfo
export CRAY_OMP_CHECK_AFFINITY=TRUE

module load xthi  (if available)
srun --mpi=pmix -c $t -n 1 xthi