Julia CECI HPC - gher-uliege/Documentation GitHub Wiki

Loading Julia:

After loading the module, julia will be available:

module load Julia/1.6.1-linux-x86_64

Check the latest installed version with module spider julia.

Julia with Revise.jl

Revise.jl allows to automatically reload module and files with functions you are editing. Since the home directory is on a network file system, it is necessary to set the JULIA_REVISE_POLL environment variable.

export JULIA_REVISE_POLL=1

Julia with Python and matplotlib

Julia can call Python function using the PyCall.jl package. In particular, one can use matplotlib for plotting using PyPlot.jl. Here is an example for using the graphical back-end TkAgg.

module load releases/2021a matplotlib/3.4.2-foss-2021a Tkinter/3.9.5-GCCcore-10.3.0
export MPLBACKEND=TkAgg
export PYTHON=$(which python) # sets the path to the full python interpreter for the installation of PyCall

Julia

using Pkg
Pkg.add(["PyCall","PyPlot"])

Testing:

using PyCall
using PyPlot
# should use packages from easybuild
@show PyCall.libpython;
@show PyPlot.matplotlib;
plot(1:10) # should show a plot in a new window

Parallel computing with Julia

General information is available at: https://docs.julialang.org/en/v1/manual/distributed-computing/

Multithreading

Multithreading is equivalent to OpenMP-style programming. To start an interactive session with 4 threads one can use:

srun --cpus-per-task=4 --mem-per-cpu=1000  --time=1:00:00 --pty julia --threads=4

The --cpus-per-task should match the --threads. In a submission script, this can automatically be achieved by using $SLURM_CPUS_PER_TASK:

export JULIA_NUM_THREADS="$SLURM_CPUS_PER_TASK"

Test julia script:

@show Threads.nthreads() # should match $SLURM_CPUS_PER_TASK
Threads.@threads for i = 1:8
   println("Hello from $(Threads.threadid())")
end

Multiprocessing

Multiprocessing is possible with the build-in module Distributed. One can use the ClusterManagers.jl which integrates nicely with SLURM. To start an interactive session with 4 tasks/CPUs:

srun --ntasks=4 --mem-per-cpu=1000  --time=1:00:00 --pty julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])))
for i in workers()
   host, pid = fetch(@spawnat i (gethostname(), getpid()))
   println("Hello from $host (pid=$pid)")
end

A typical output would be:

Hello from nic5-w007 (pid=184570)
Hello from nic5-w010 (pid=1495291)
Hello from nic5-w010 (pid=1495292)
Hello from nic5-w010 (pid=1495293)

NOTE: addprocs will setup a connection between all workers (which can be very slow if there are many workers). With the parametertopology = :master_worker, only the driver process, i.e. pid 1 connects to the workers. The workers do not connect to each other. See ?addprocs for more information.

MPI

Multiprocessing using the system MPI libraries.

Installation

Load the module

module load EasyBuild/2023a OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0
which mpiexec
# output 
# /gpfs/softs/easybuild/2023a/software/OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0/bin/mpiexec

Install the julia package MPI and MPIPreferences:

julia> using MPI
julia> pathof(MPI)
"/gpfs/home/acad/ulg-gher/abarth/.julia/packages/MPI/TKXAj/src/
julia> using MPIPreferences

julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│   libmpi = "libmpi_cray"
│   version_string = "MPI VERSION    : CRAY MPICH version 8.1.26.13 (ANL base 3.4a2)\nMPI BUILD INFO : Mon Apr 17 15:23 2023 (git hash 429479e)\n"
│   impl = "CrayMPICH"
│   version = v"8.1.26"
└   abi = "MPICH"
┌ Info: MPIPreferences changed
│   binary = "system"
│   libmpi = "libmpi_cray"
│   abi = "MPICH"
│   mpiexec = "mpiexec"
│   preloads = Any[]
└   preloads_env_switch = nothing

Sample MPI program in julia:

cat > test_mpi.jl  <<EOF
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
println("Hello world, I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))")
MPI.Barrier(comm)
EOF

Run as:

srun --account=YOUR_ACCOUNT --job-name=test --partition debug --time=1:00:00  --mem-per-cpu=700M  --ntasks=4  julia test_mpi.jl

Expected output:

Hello world, I am rank 3 of 4
Hello world, I am rank 2 of 4
Hello world, I am rank 0 of 4
Hello world, I am rank 1 of 4

Tested with julia v1.11.0, MPI.jl v0.20.22, MPIPreferences.jl v0.1.11 on lucia.

For more information, visit https://juliaparallel.github.io/MPI.jl/stable/configuration/.

CUDA

Start a shell with on a node with a GPU:

srun --account=dincae --job-name=install --partition gpu --gres=gpu:1 --time=1:00:00  --mem-per-cpu=20000  --ntasks=1 --cpus-per-task=4 --pty bash -i

Then start a julia session and install the packages your need e.g. ]add CUDA cuDNN NCDatasets. To make good use a GPU, one need to use multiple threads in julia by starting julia with e.g. julia -t 4 for 4 threads.

CUDA+MPI

srun --account=dincae --job-name=test --partition debug-gpu --gres=gpu:1 --time=2:00:00  --mem-per-cpu=10000  --ntasks=1 --cpus-per-task=1 --pty bash -i

module load EasyBuild/2023a OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0 cuDNN/8.9.2.26-CUDA-12.2.0
julia --project=.
using CUDA; CUDA.set_runtime_version!(local_toolkit=true)

julia> using  CUDA; CUDA.versioninfo()
Precompiling CUDA...
  3 dependencies successfully precompiled in 52 seconds. 97 already precompiled.
CUDA runtime 12.2, local installation
CUDA driver 12.8
NVIDIA driver 570.86.15

CUDA libraries: 
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 2023.2.0 (API 20.0.0)
- NVML: 12.0.0+570.86.15

Julia packages: 
- CUDA: 5.6.1
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0
- CUDA_Runtime_Discovery: 0.3.5

Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.local: true

1 device:
  0: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)

Issues

curl_easy_setopt: 48

The following error is without consequences and can be safely ignored.

┌ Error: curl_easy_setopt: 48
└ @ Downloads.Curl /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/utils.jl:36

Timed out

ERROR: Timed out waiting to read host:port string from worker.

It might be necessary to the variable JULIA_WORKER_TIMEOUT This is the case on vega.

Error: CUDA.jl could not find an appropriate CUDA runtime to use.

┌ Error: CUDA.jl could not find an appropriate CUDA runtime to use.
│ 
│ This can have several reasons:
│ * you are using an unsupported platform: this version of CUDA.jl
│   only supports Linux (x86_64, aarch64, ppc64le) and Windows (x86_64),
│   while your platform was identified as x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.2-cuda_local+false;
│   was not available (i.e., a container, or an HPC login node).
│   in that case, you need to specify which CUDA version to use
│   by calling `CUDA.set_runtime_version!`;
│ * you requested use of a local CUDA toolkit, but not all
│   required components were discovered. try running with
│   JULIA_DEBUG=all in your environment for more details.
│ 
│ For more details, refer to the CUDA.jl documentation at
│ https://cuda.juliagpu.org/stable/installation/overview/
└ @ CUDA ~/.julia/packages/CUDA/htRwP/src/initialization.jl:82

Set the runtime manually:

julia> CUDA.set_runtime_version!(v"12.1.0") 
[ Info: Configure the active project to use CUDA 12.1; please re-start Julia for this to take effect.

References:

https://discourse.julialang.org/t/cuda-could-not-find-an-appropiate-cuda-runtime-to-use/97201/5

Help

For help and question contact the friendly folks at https://discourse.julialang.org/. If you get an error, please make sure to read the documentation of the relevant package.