Julia Jupyter notebook on CECI HPC - gher-uliege/Documentation GitHub Wiki

[!IMPORTANT]
Please do not forget to free CPU/GPU/... resources to not unnecessarily deplete the computation time budget

Connect to the cluster with port-forwarding

You need to first configure SSH as described CECI documentation. and the Lucia documentation.

Assuming that you can already connect to lucia with ssh lucia, on your laptop/PC, connect to lucia with port forwarding:

ssh -L 9999:localhost:9999 lucia

If the port 9999 is already used, you can use a different number and adapt all port numbers below accordingly. The port number should be larger than 1024.

Install jupyter notebook

Run the following shell commands to install jupyter:

pip install -U --user pip
which pip
# should be ~/.local/bin/pip
pip install --user jupyter
which jupyter
# should be ~/.local/bin/jupyter
export JUPYTER=$(which jupyter)

Run the following julia commands to install IJulia:

using Pkg
Pkg.add("IJulia")
using IJulia
@show IJulia.JUPYTER

The last command gives you the path to the jupyter program, for example "/gpfs/home/acad/ulg-gher/abarth/.local/bin/jupyter".

Run on the frontal node

Simple data preparation tasks can be done on the frontal node as long as they do not require much resources:

jupyter notebook  --no-browser --port=9999

Then open the link (e.g. http://localhost:9999/?token=LONG_LONG_TOKEN) in your web-browser as instructed.

Run on compute nodes

Start jupyter notebook via the SLURM command srun for example with:

srun --account=ACCOUNT_NAME --job-name=notebook --partition debug-gpu --gres=gpu:1 --time=1:00:00  --mem-per-cpu=6000  --ntasks=1 --cpus-per-task=1 --pty jupyter notebook --no-browser --port=9999

You may need to adapt the options --account, --time, --mem-per-cpu, --ntasks, --cpus-per-task... (see https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html).

Enable port forwarding from frontal to the compute node in an additional separate SSH session by running this command on the frontal system

ssh -L 9999:localhost:9999 cnaXYZ

where cnaXYZ is the allocated compute node as reported by the shell command squeue --me.

[!TIP] The julia command gethostname() allows you to double check that you are running on the correct node. With the command using CUDA; CUDA.functional() you can check if a CUDA GPU available.

Free resource

Hit Control-C in the terminal where you lanched jupyter notebook wuth srun. Verify that the resources are freed with the following command to be run on the frontal node:

squeue --me

Explicitly cancel a job with scancel JOB_NUMBER if necessary.