Jupyter - nthu-ioa/cluster GitHub Wiki
See also Juptyer Lab Tips and instructions for setting up SSH keys .
You could also use Jupyternotebook with Visual Studio Code.
Jupyter is a good way to work interactively with python on remote machines.
After some initial setup (see one-time setup below), the basic idea is to:
- Start a jupyter lab server on the
m01
node using Slurm. - Connect to that server from your desktop/laptop using an ssh tunnel.
One-time setup
You will need to do the following before you can use jupyter on the cluster for the first time.
-
Install and activate your python conda environment: see Python.
-
Install Jupyterlab in the environment you activated:
conda install -c conda-forge jupyter jupyterlab
You can install jupyterlab into any of your conda environments.
Do all the above before starting to follow the instructions below.
-
A recent change to jupyterlab has introduced an extra "one-off" step. This comes after you have followed the steps below. The first time you successfully connect your browser to a jupyterlab session, you will be asked to enter your token and set a password. The token can be found in the log file of your jupyter job (look for
token=
followed by a long number). Copy and paste that number into the box that jupyter presents, then choose a password. You will be asked for this jupyter password whenever you connect in future. The point of the password is to stop other clusters users connecting to your jupyter sessions! -
Add Jupyter kernels for your own conda environments
Jupyter doesn't automatically know about your different conda environments. If you have a conda environment called 'myenv' you let Jupyter know about it with the following two steps (in a terminal, substituting the actual name of your environment for myenv
):
> source activate myenv`
> python -m ipykernel install --user --name myenv --display-name "Python (myenv)"`
You only need to do this once for each environment. When you have done this, you will be able to select "Python (myenv)" from the kernels menu in your notebook. The code in the notebook will then execute using the version of python in that environment, and have access to the packages you have installed.
When you have followed the steps above, you're ready to start using Jupyter over SLURM, as explained in the next section.
Running a Jupyter Lab Server using SLURM
This is the best-practice way to use jupyter lab on the cluster.
First, on the cluster login node, create a job script by copying the following example. This could be called jupyter-slurm.sh
and live under ~/jupyter
. Note that (by default) log files will be written to the same place as this job script.
- You need to replace
MY_PORT_NUMBER
with a valid port number (the default is 8888, but you should choose a favorite number between 8888 and 9999). Your session might end up on a different port if the number you ask for is taken, so you will need to check the log. - The line
source activate NAME_OF_YOUR_CONDA_ENVIRONMENT
only applies if you want to start jupyter in a specific conda environment. Note that, if you follow the environment kernel tip on Jupyter Lab Tips, you can switch environments within the jupyter session: you certainly don't need this line in that case.
#!/bin/bash
#SBATCH --partition mem
#SBATCH --nodelist m01
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 2
#SBATCH --mem 32G
#SBATCH --job-name jupyter
#SBATCH --output /data/YOUR_USERNAME/jupyter-%J.log
#SBATCH --time 3-0
module purge
module load python
module load slurm_limit_threads
source activate NAME_OF_YOUR_CONDA_ENVIRONMENT
cd /
srun jupyter lab --no-browser --port MY_PORT_NUMBER
This example requests 32GB of RAM and 2 cores for 3 days. If you need more resources, please adjust this as needed.
Note:
- We prefer jupyter sessions to run on
m01
. - 2 is the minimum number of cores you can reserve; if you request 1 core Slurm will reserve 2 anyway, because of the way it treats hyperthreads;
- The
slurm_limit_threads
module sets environment variables to correctly match the number of threads generated by multithreaded Python libraries to the number of available cores. - The
cd /
line is just to set the root of the file tree in jupyterlab to something more useful than the directory where your job script lives. In this case it sets the root of the jupyter file tree to be the root of our file system, from which you can access/cluster/home/yourusername
and/data/yourusername
. - See our page on Slurm for the meaning of the
srun
command. - If you're happy to over-write the log file every time you start a new server, you can remove the
-%J
from the last#SBATCH
command. You can change the path if you don't want the log file(s) in the root of your data space (for example, you could create/data/your_username/jupyter
and send the logs there instead).
Submit this job script, verify that the job is running on the cluster and check its Slurm JOB_ID. For example:
sbatch jupyter-slurm.sh
squeue
Your job will now stay running until you kill it with scancel {JOB ID}
.
Connecting to your jupyter session with an SSH tunnel
First, follow the instructions for setting up an internal ssh key on the cluster.
Now, on your local machine, you should be able to connect to the jupyter server with an SSH tunnel as follows
ssh {your account name}@fomalhaut.astr.nthu.edu.tw -L PORT:localhost:PORT ssh m01 -L PORT:localhost:PORT
where PORT is MY_PORT_NUMBER from the previous step. To save typing, you could make this into a shell script (which you could call e.g. clusterjpt
):
#!/bin/sh
ssh {your account name}@fomalhaut.astr.nthu.edu.tw -L $1:localhost:$1 ssh m01 -L $1:localhost:$1
Then ./clusterjpt 8888
would connect you to your m01
jupyter session on port 8888.
If you have an entry for fomalhaut in your local .ssh/config
(instructions here) you would only need ssh fomalhaut -L ...
in the lines above.
:warning: The first time you connect, you will need to follow point 3 under the "one-time setup" heading at the top of this page to set a jupyter lab password.
Troubleshooting
Slurm job runs OK, but can't connect from remote machine
channel 3: open failed: connect failed: Connection refused
There are many possible reasons for this. One possibility is that the memory node you are using is not in the list of 'trusted hosts' for ssh
connections from your account on fomalhaut.
Solution: From fomalhaut, connect to the node your job is running on directly from the command line (e.g. ssh m01
) and answer 'yes' to the question about trusting the host. If you are asked for a password, this will be your original login password: the one you chose after you were first given access to the cluster (you may have set up a private ssh key to connect to fomalhaut, with its own passphrase -- that passphrase is not what is being asked for here).
This step should not be necessary, so if it fixes your problem please let us know.