CUDA - CliMA/slurm-buildkite GitHub Wiki
Using the system CUDA runtime library
The CUDA runtime library is quite a large download: using the library installed on the cluster can save significant overhead. For CUDA.jl 4 or later, you set the CUDA_Runtime_jll.jl preferences to version = "local". See Julia - Preferences.
CUDA-aware MPI
Configuration
Use the following modules:
cuda/11.2 ucx/1.13.1_cuda-11.2 openmpi/4.1.5_cuda-11.2
In addition, you may need to set the following environment variables:
env:
JULIA_CUDA_MEMORY_POOL: none
OMPI_MCA_opal_warn_on_missing_libcuda: 0
- the first disables the CUDA.jl memory pool: see MPI.jl known issues.
- the second prevents a warning from being displayed if CUDA is not available (e.g. if you're using MPI on a regular CPU node).
Check that it is using GPU-to-GPU direct communication
Look at profile, make sure it is not using DtoH/HtoD memory operations.