Jupyter - nthu-ioa/cluster GitHub Wiki

See also Juptyer Lab Tips and instructions for setting up SSH keys .
You could also use Jupyternotebook with Visual Studio Code.

Jupyter is a good way to work interactively with python on remote machines.

After some initial setup (see one-time setup below), the basic idea is to:

  1. Start a jupyter lab server on the m01 node using Slurm.
  2. Connect to that server from your desktop/laptop using an ssh tunnel.

One-time setup

You will need to do the following before you can use jupyter on the cluster for the first time.

  1. Install and activate your python conda environment: see Python.

  2. Install Jupyterlab in the environment you activated: conda install -c conda-forge jupyter jupyterlab

You can install jupyterlab into any of your conda environments.

Do all the above before starting to follow the instructions below.

  1. A recent change to jupyterlab has introduced an extra "one-off" step. This comes after you have followed the steps below. The first time you successfully connect your browser to a jupyterlab session, you will be asked to enter your token and set a password. The token can be found in the log file of your jupyter job (look for token= followed by a long number). Copy and paste that number into the box that jupyter presents, then choose a password. You will be asked for this jupyter password whenever you connect in future. The point of the password is to stop other clusters users connecting to your jupyter sessions!

  2. Add Jupyter kernels for your own conda environments

Jupyter doesn't automatically know about your different conda environments. If you have a conda environment called 'myenv' you let Jupyter know about it with the following two steps (in a terminal, substituting the actual name of your environment for myenv):

> source activate myenv`
> python -m ipykernel install --user --name myenv --display-name "Python (myenv)"`

You only need to do this once for each environment. When you have done this, you will be able to select "Python (myenv)" from the kernels menu in your notebook. The code in the notebook will then execute using the version of python in that environment, and have access to the packages you have installed.

When you have followed the steps above, you're ready to start using Jupyter over SLURM, as explained in the next section.

Running a Jupyter Lab Server using SLURM

This is the best-practice way to use jupyter lab on the cluster.

First, on the cluster login node, create a job script by copying the following example. This could be called jupyter-slurm.sh and live under ~/jupyter. Note that (by default) log files will be written to the same place as this job script.

  • You need to replace MY_PORT_NUMBER with a valid port number (the default is 8888, but you should choose a favorite number between 8888 and 9999). Your session might end up on a different port if the number you ask for is taken, so you will need to check the log.
  • The line source activate NAME_OF_YOUR_CONDA_ENVIRONMENT only applies if you want to start jupyter in a specific conda environment. Note that, if you follow the environment kernel tip on Jupyter Lab Tips, you can switch environments within the jupyter session: you certainly don't need this line in that case.
#!/bin/bash
#SBATCH --partition mem
#SBATCH --nodelist m01
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 2
#SBATCH --mem 32G
#SBATCH --job-name jupyter
#SBATCH --output /data/YOUR_USERNAME/jupyter-%J.log
#SBATCH --time 3-0

module purge
module load python
module load slurm_limit_threads

source activate NAME_OF_YOUR_CONDA_ENVIRONMENT
cd /
srun jupyter lab --no-browser --port MY_PORT_NUMBER

This example requests 32GB of RAM and 2 cores for 3 days. If you need more resources, please adjust this as needed.

Note:

  • We prefer jupyter sessions to run on m01.
  • 2 is the minimum number of cores you can reserve; if you request 1 core Slurm will reserve 2 anyway, because of the way it treats hyperthreads;
  • The slurm_limit_threads module sets environment variables to correctly match the number of threads generated by multithreaded Python libraries to the number of available cores.
  • The cd / line is just to set the root of the file tree in jupyterlab to something more useful than the directory where your job script lives. In this case it sets the root of the jupyter file tree to be the root of our file system, from which you can access /cluster/home/yourusername and /data/yourusername.
  • See our page on Slurm for the meaning of the srun command.
  • If you're happy to over-write the log file every time you start a new server, you can remove the -%J from the last #SBATCH command. You can change the path if you don't want the log file(s) in the root of your data space (for example, you could create /data/your_username/jupyter and send the logs there instead).

Submit this job script, verify that the job is running on the cluster and check its Slurm JOB_ID. For example:

sbatch jupyter-slurm.sh
squeue

Your job will now stay running until you kill it with scancel {JOB ID}.

Connecting to your jupyter session with an SSH tunnel

First, follow the instructions for setting up an internal ssh key on the cluster.

Now, on your local machine, you should be able to connect to the jupyter server with an SSH tunnel as follows

ssh {your account name}@fomalhaut.astr.nthu.edu.tw -L PORT:localhost:PORT ssh m01 -L PORT:localhost:PORT

where PORT is MY_PORT_NUMBER from the previous step. To save typing, you could make this into a shell script (which you could call e.g. clusterjpt):

#!/bin/sh
ssh {your account name}@fomalhaut.astr.nthu.edu.tw -L $1:localhost:$1 ssh m01 -L $1:localhost:$1

Then ./clusterjpt 8888 would connect you to your m01 jupyter session on port 8888.

If you have an entry for fomalhaut in your local .ssh/config (instructions here) you would only need ssh fomalhaut -L ... in the lines above.

:warning: The first time you connect, you will need to follow point 3 under the "one-time setup" heading at the top of this page to set a jupyter lab password.

Troubleshooting

Slurm job runs OK, but can't connect from remote machine

channel 3: open failed: connect failed: Connection refused

There are many possible reasons for this. One possibility is that the memory node you are using is not in the list of 'trusted hosts' for ssh connections from your account on fomalhaut.

Solution: From fomalhaut, connect to the node your job is running on directly from the command line (e.g. ssh m01) and answer 'yes' to the question about trusting the host. If you are asked for a password, this will be your original login password: the one you chose after you were first given access to the cluster (you may have set up a private ssh key to connect to fomalhaut, with its own passphrase -- that passphrase is not what is being asked for here).

This step should not be necessary, so if it fixes your problem please let us know.