Marenostrum 5 GPU - loganoz/horses3d GitHub Wiki

Compiler:

**NVFORTRAN ** (last checked:01/06/2025)

We need the nvidia SDK and compiler to generate code for the FORTRAN/OpenACC implementation in Horses3D.

List of modules needed to compile and run:

module purge
module load nvidia-hpc-sdk/24.11
module load metis/5.1.0-gcc

Then, to compile the NS (compressible Navier-Stokes) solver:

make ns COMPILER=nvfortran COMM=PARALLEL WITH_METIS=YES

or to compile the MU (multiphase) solver:

make mu COMPILER=nvfortran COMM=PARALLEL WITH_METIS=YES

If a signle GPU run is to be used then the last two options can be ommited.

Slurm scripts

An example slurm script is included below. Two scripts are necessary to use MN5 properly. The first one is to setup the job and the second one is to set the affinity of the GPUs in the node. The queue and account details should be specified in the slurm script for proper submission to MN5:

#!/bin/bash

### Job name on queue
#SBATCH --job-name=MN5_Horses3D_GPU

### Output and error files directory
#SBATCH -D .

### Output and error files
#SBATCH --output=out%j.out
#SBATCH --error=err%j.err

### Run configuration
#SBATCH --ntasks=1 				# Number of GPUs to be used
#SBATCH --ntasks-per-node=1 	# Up to 4 - If number of ranks > 4
#SBATCH --cpus-per-task=20  	# This is the MN5 standard 20 cpus/1 GPU
#SBATCH --time=00:10:00
#SBATCH --gres=gpu:1 			# Number of GPUs per node - up to 4

### Queue and account
#SBATCH --qos=acc_ehpc
#SBATCH --account=ehpc175

### MN% modules
module purge
module load nvidia-hpc-sdk/24.11
module load metis/5.1.0-gcc

EXEC= PATH_T0_HORSES_EXECUTABLE 

# For parallel runs
mpirun -np 16 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh $EXEC CASE_FILE.control

# For serial runs
#srun --unbuffered $EXEC CASE_FILE.control

The mn5_bind.sh script should be defined as follows and controls the affinity of the GPUs:

#!/bin/bash

case ${OMPI_COMM_WORLD_LOCAL_RANK} in
0)
export CUDA_VISIBLE_DEVICES=0
export OMPI_MCA_btl_openib_if_include=mlx5_0:1
export UCX_NET_DEVICES=mlx5_0:1
numactl --membind=0 "$@"
  ;;
1)
export CUDA_VISIBLE_DEVICES=1
export UCX_NET_DEVICES=mlx5_1:1
export OMPI_MCA_btl_openib_if_include=mlx5_1:1
numactl --membind=1 "$@"
  ;;
2)
export CUDA_VISIBLE_DEVICES=2
export UCX_NET_DEVICES=mlx5_4:1
export OMPI_MCA_btl_openib_if_include=mlx5_4:1
numactl --membind=2 "$@"
  ;;
3)
export CUDA_VISIBLE_DEVICES=3
export UCX_NET_DEVICES=mlx5_5:1
export OMPI_MCA_btl_openib_if_include=mlx5_5:1
numactl --membind=3 "$@"
  ;;
esac

# Found in: https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/Documentation/Running/mn5-run

To send the job, i.e.:

sbatch run.sh