Julia CECI HPC - gher-uliege/Documentation GitHub Wiki
Loading Julia:
After loading the module, julia will be available:
module load Julia/1.6.1-linux-x86_64
Check the latest installed version with module spider julia
.
Julia with Revise.jl
Revise.jl
allows to automatically reload module and files with functions you are editing. Since the home directory is on a network file system, it is necessary to set the JULIA_REVISE_POLL
environment variable.
export JULIA_REVISE_POLL=1
Julia with Python and matplotlib
Julia can call Python function using the PyCall.jl
package.
In particular, one can use matplotlib
for plotting using PyPlot.jl
. Here is an example for using the graphical back-end TkAgg
.
module load releases/2021a matplotlib/3.4.2-foss-2021a Tkinter/3.9.5-GCCcore-10.3.0
export MPLBACKEND=TkAgg
export PYTHON=$(which python) # sets the path to the full python interpreter for the installation of PyCall
Julia
using Pkg
Pkg.add(["PyCall","PyPlot"])
Testing:
using PyCall
using PyPlot
# should use packages from easybuild
@show PyCall.libpython;
@show PyPlot.matplotlib;
plot(1:10) # should show a plot in a new window
Parallel computing with Julia
General information is available at: https://docs.julialang.org/en/v1/manual/distributed-computing/
Multithreading
Multithreading is equivalent to OpenMP-style programming. To start an interactive session with 4 threads one can use:
srun --cpus-per-task=4 --mem-per-cpu=1000 --time=1:00:00 --pty julia --threads=4
The --cpus-per-task
should match the --threads
. In a submission script, this can automatically be achieved by using $SLURM_CPUS_PER_TASK
:
export JULIA_NUM_THREADS="$SLURM_CPUS_PER_TASK"
Test julia script:
@show Threads.nthreads() # should match $SLURM_CPUS_PER_TASK
Threads.@threads for i = 1:8
println("Hello from $(Threads.threadid())")
end
Multiprocessing
Multiprocessing is possible with the build-in module Distributed
. One can use the ClusterManagers.jl which integrates nicely with SLURM.
To start an interactive session with 4 tasks/CPUs:
srun --ntasks=4 --mem-per-cpu=1000 --time=1:00:00 --pty julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])))
for i in workers()
host, pid = fetch(@spawnat i (gethostname(), getpid()))
println("Hello from $host (pid=$pid)")
end
A typical output would be:
Hello from nic5-w007 (pid=184570)
Hello from nic5-w010 (pid=1495291)
Hello from nic5-w010 (pid=1495292)
Hello from nic5-w010 (pid=1495293)
NOTE:
addprocs
will setup a connection between all workers (which can be very slow if there are many workers). With the parametertopology = :master_worker
, only the driver process, i.e. pid 1 connects to the workers. The workers do not connect to each other. See?addprocs
for more information.
MPI
Multiprocessing using the system MPI libraries.
Installation
Load the module
module load EasyBuild/2023a OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0
which mpiexec
# output
# /gpfs/softs/easybuild/2023a/software/OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0/bin/mpiexec
Install the julia package MPI
and MPIPreferences
:
julia> using MPI
julia> pathof(MPI)
"/gpfs/home/acad/ulg-gher/abarth/.julia/packages/MPI/TKXAj/src/
julia> using MPIPreferences
julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│ libmpi = "libmpi_cray"
│ version_string = "MPI VERSION : CRAY MPICH version 8.1.26.13 (ANL base 3.4a2)\nMPI BUILD INFO : Mon Apr 17 15:23 2023 (git hash 429479e)\n"
│ impl = "CrayMPICH"
│ version = v"8.1.26"
└ abi = "MPICH"
┌ Info: MPIPreferences changed
│ binary = "system"
│ libmpi = "libmpi_cray"
│ abi = "MPICH"
│ mpiexec = "mpiexec"
│ preloads = Any[]
└ preloads_env_switch = nothing
Sample MPI program in julia:
cat > test_mpi.jl <<EOF
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
println("Hello world, I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))")
MPI.Barrier(comm)
EOF
Run as:
srun --account=YOUR_ACCOUNT --job-name=test --partition debug --time=1:00:00 --mem-per-cpu=700M --ntasks=4 julia test_mpi.jl
Expected output:
Hello world, I am rank 3 of 4
Hello world, I am rank 2 of 4
Hello world, I am rank 0 of 4
Hello world, I am rank 1 of 4
Tested with julia v1.11.0, MPI.jl v0.20.22, MPIPreferences.jl v0.1.11 on lucia.
For more information, visit https://juliaparallel.github.io/MPI.jl/stable/configuration/.
CUDA
Start a shell with on a node with a GPU:
srun --account=dincae --job-name=install --partition gpu --gres=gpu:1 --time=1:00:00 --mem-per-cpu=20000 --ntasks=1 --cpus-per-task=4 --pty bash -i
Then start a julia session and install the packages your need e.g. ]add CUDA cuDNN NCDatasets
.
To make good use a GPU, one need to use multiple threads in julia by starting julia with e.g. julia -t 4
for 4 threads.
CUDA+MPI
srun --account=dincae --job-name=test --partition debug-gpu --gres=gpu:1 --time=2:00:00 --mem-per-cpu=10000 --ntasks=1 --cpus-per-task=1 --pty bash -i
module load EasyBuild/2023a OpenMPI/4.1.5-NVHPC-23.7-CUDA-12.2.0 cuDNN/8.9.2.26-CUDA-12.2.0
julia --project=.
using CUDA; CUDA.set_runtime_version!(local_toolkit=true)
julia> using CUDA; CUDA.versioninfo()
Precompiling CUDA...
3 dependencies successfully precompiled in 52 seconds. 97 already precompiled.
CUDA runtime 12.2, local installation
CUDA driver 12.8
NVIDIA driver 570.86.15
CUDA libraries:
- CUBLAS: 12.2.1
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.0
- CUSPARSE: 12.1.1
- CUPTI: 2023.2.0 (API 20.0.0)
- NVML: 12.0.0+570.86.15
Julia packages:
- CUDA: 5.6.1
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0
- CUDA_Runtime_Discovery: 0.3.5
Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6
Preferences:
- CUDA_Runtime_jll.local: true
1 device:
0: NVIDIA A100-SXM4-40GB (sm_80, 39.490 GiB / 40.000 GiB available)
Issues
curl_easy_setopt: 48
The following error is without consequences and can be safely ignored.
┌ Error: curl_easy_setopt: 48
└ @ Downloads.Curl /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Downloads/src/Curl/utils.jl:36
Timed out
ERROR: Timed out waiting to read host:port string from worker.
It might be necessary to the variable
JULIA_WORKER_TIMEOUT
This is the case on vega.
Error: CUDA.jl could not find an appropriate CUDA runtime to use.
┌ Error: CUDA.jl could not find an appropriate CUDA runtime to use.
│
│ This can have several reasons:
│ * you are using an unsupported platform: this version of CUDA.jl
│ only supports Linux (x86_64, aarch64, ppc64le) and Windows (x86_64),
│ while your platform was identified as x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.2-cuda_local+false;
│ was not available (i.e., a container, or an HPC login node).
│ in that case, you need to specify which CUDA version to use
│ by calling `CUDA.set_runtime_version!`;
│ * you requested use of a local CUDA toolkit, but not all
│ required components were discovered. try running with
│ JULIA_DEBUG=all in your environment for more details.
│
│ For more details, refer to the CUDA.jl documentation at
│ https://cuda.juliagpu.org/stable/installation/overview/
└ @ CUDA ~/.julia/packages/CUDA/htRwP/src/initialization.jl:82
Set the runtime manually:
julia> CUDA.set_runtime_version!(v"12.1.0")
[ Info: Configure the active project to use CUDA 12.1; please re-start Julia for this to take effect.
References:
https://discourse.julialang.org/t/cuda-could-not-find-an-appropiate-cuda-runtime-to-use/97201/5
Help
For help and question contact the friendly folks at https://discourse.julialang.org/. If you get an error, please make sure to read the documentation of the relevant package.