Julia on LUMI - gher-uliege/Documentation GitHub Wiki
A simple convolution:
using Flux, AMDGPU; m = gpu(Conv((3,3),3 => 8)); x = gpu(randn(Float32,16,16,3,1)); m(x);
fails with the error:
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen(HIP): Warning [Find] /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/include/miopen/sqlite_db.hpp:260: Cannot open database file:/tmp/gfx90a6e_1.1.0.udb
MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/ocl/convolutionocl.cpp:299: No suitable algorithm was found to execute the required convolution
ERROR: MIOpenException:
- status: miopenStatusUnknownError
- description: Unknown error
export MIOPEN_USER_DB_PATH="/tmp/my-miopen-cache"
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
rm -rf ${MIOPEN_USER_DB_PATH}
mkdir -p ${MIOPEN_USER_DB_PATH}
Replace <node_name>
with the actual node identifier and <jobid>
with the ID of your SLURM job
srun --overlap --pty --jobid=<jobid> -w <node_name> rocm-smi --showuse
srun --overlap --pty --jobid=13486921 top
See ~/.julia/dev/FlowMatching/examples/training.sh
salloc --account=project_465001568 --partition dev-g --nodes=1 --gpus=1 --ntasks=1 --time=2:00:00 --mem-per-cpu=25G
Prepare interactive session:
export MIOPEN_USER_DB_PATH="/tmp/my-miopen-cache-$USER-$$"
export MIOPEN_CUSTOM_CACHE_DIR="$MIOPEN_USER_DB_PATH"
rm -rf "$MIOPEN_USER_DB_PATH"
mkdir -p "$MIOPEN_USER_DB_PATH"
DEPOT_FILE=$HOME/julia-depot-FlowMatching.tar.xz
DEPOT="/tmp/$(basename "${DEPOT_FILE%.tar.xz}")-$USER"
export JULIA_NUM_THREADS="$SLURM_CPUS_PER_TASK"
export JULIA_DEPOT_PATH="$HOME/.julia:$DEPOT:$JULIA_DEPOT_PATH"
export JULIA_HISTORY="$HOME/.julia/logs/repl_history.jl"
if [ -e Project.toml ]; then
export JULIA_PROJECT="$PWD"
fi
# https://docs.lumi-supercomputer.eu/development/compiling/prgenv/#gpu-aware-mpi
export MPICH_GPU_SUPPORT_ENABLED=1
# on each node but only once per node
echo "extracting $DEPOT_FILE"
srun flock --nonblock --conflict-exit-code=0 "/tmp/julia-depot-lock-$SLURM_JOBID" tar -xf $DEPOT_FILE -C /tmp
srun ls $DEPOT
if [ "$SLURM_NTASKS" -gt "1" ]; then
export PARALLEL=true
else
export PARALLEL=false
fi
#export ENABLE_JITPROFILING=1
#srun rocprofv2 --plugin perfetto --hip-trace --hsa-trace --kernel-trace -o prof julia training.jl
srun --interactive --pty julia
# or
# srun julia some_script.jl
sbatch ~/projects/bin/mkdepot
$ module li
Currently Loaded Modules:
- craype-x86-rome 4) perftools-base/24.03.0 7) craype/2.7.31.11 10) cray-libsci/24.03.0 13) lumi-tools/24.05 (S) 16) julia/1.11.2
- libfabric/1.15.2.0 5) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta 8) cray-dsmml/0.3.0 11) PrgEnv-cray/8.5.0 14) init-lumi/0.2 (S)
- craype-network-ofi 6) cce/17.0.1 9) cray-mpich/8.1.29 12) ModuleLabel/label (S) 15) Local-CSC/default (S)
cc --cray-print-opts=all
$ mpicc --cray-print-opts=all -L/opt/cray/pe/cce/17.0.1/cce/x86_64/lib/pkgconfig/../ -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup
$ lumi-check-quota $ lumi-quota lumi-workspaces
https://docs.lumi-supercomputer.eu/runjobs/lumi_env/dailymanagement/