Julia on LUMI - gher-uliege/Documentation GitHub Wiki

Convolutions

A simple convolution:

using Flux, AMDGPU;  m = gpu(Conv((3,3),3 => 8)); x = gpu(randn(Float32,16,16,3,1)); m(x);

fails with the error:

MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:299: No suitable algorithm was found to execute the required convolution
ERROR: MIOpenException:
- status: miopenStatusUnknownError
- description: Unknown error

work-around

export MIOPEN_USER_DB_PATH="/tmp/my-miopen-cache"
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
rm -rf ${MIOPEN_USER_DB_PATH}
mkdir -p ${MIOPEN_USER_DB_PATH}

Monitor GPU

Replace <node_name> with the actual node identifier and <jobid> with the ID of your SLURM job

srun --overlap --pty --jobid=<jobid> -w <node_name> rocm-smi --showuse

MPI

$ module li

Currently Loaded Modules:

  1. craype-x86-rome 4) perftools-base/24.03.0 7) craype/2.7.31.11 10) cray-libsci/24.03.0 13) lumi-tools/24.05 (S) 16) julia/1.11.2
  2. libfabric/1.15.2.0 5) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta 8) cray-dsmml/0.3.0 11) PrgEnv-cray/8.5.0 14) init-lumi/0.2 (S)
  3. craype-network-ofi 6) cce/17.0.1 9) cray-mpich/8.1.29 12) ModuleLabel/label (S) 15) Local-CSC/default (S)

cc --cray-print-opts=all

$ mpicc --cray-print-opts=all -L/opt/cray/pe/cce/17.0.1/cce/x86_64/lib/pkgconfig/../ -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup

⚠️ **GitHub.com Fallback** ⚠️