JupyterHub Server - nthu-ioa/cluster GitHub Wiki

Work in progress!

[!WARNING]
:warning: This page is being written while the JupyterHub server is under testing and development. At the moment, it is not ready for general use. The server could stop and start any any moment, and for any period of time, without warning. If you use it during this development period, you do so at your own risk. Any important work should be done with the "old" SSH tunnel method for the time being.

For announcements (e.g. server on/off) and discussion of issues during testing, please join this Slack: https://join.slack.com/t/cicausers/shared_invite/zt-3b41zlv1c-YZfNaqlmxwgD1nWgBBwF2A

The CICA JupyterHub

We now offer a centralized JupyerHub server hosted on the CICA cluster. This can be used for everyday plotting and data analysis. It is not intended for very heavy calculations (more than a day or two of compute on more than 16 cores).

The new server is much easier to set up and access than the previous recommended way of running Jupyter notebooks on the server. Everything works entirely through your browser, no need to open a terminal. There is no need to write and submit your own Slurm script, or to create an SSH tunnel.

You need a login account on CICA to use this service. For the time being, this service is only accessible on campus (or through the VPN).

Getting started

🟢 To access the service, visit https://cica.astr.nthu.edu.tw/jupyter.

🟢 Enter your CICA username and login password.

[!NOTE]
Your login password was set when you created your CICA account. It is not the "SSH passphrase" you may also have created when setting up SSH key access to CICA.

🟢 If you see the "Start my Server" button, click it (the hub may send you directly to the next step)

🟢 Select one of the server resource options and choose "Start":

🟢 You should find yourself in the familiar JupyterLab interface.

[!TIP]
You can close your browser and reconnect later without losing your session. You may need to reload the page in the browser.

You can log out manually.

Limitations

The bandwidth between the JupyterHub server and the /lfs storage is limited. In normal use you will probably not notice, but please don't start any large file transfers between the /data* and /lfs disks from this node. Log on to the cluster via ssh as normal to do those transfers. If your notebook jobs need to do intensive IO to /lfs, please the "mem" or "gpu" Jupyter server options.

The old way still works!

You can still start your own JupyterLab processes under slurm and connect to them via an ssh tunnel as described here. Feel free to do this if:

You need resources not offered by the server options (eg. more memory, multiple GPUs);
You need to do anything nontrivial with your python environment.

Why are the server options limited to those resources and times?

The choices are intended to meet the needs of a large number of everyday users. If you have specific resource requirements, you can always set up and run a server yourself as before.

However, the choices in the menu will be kept under review. If you would like to propose any changes to the server menu, please raise an issue on the CICA github. See also below under "How do I ask a question or report a problem with Jupyterhub?".

How do I....?

Change server?

Go to "Hub Control Panel"
Choose "Stop Server"

Add my conda environment as a Jupyter kernel?

If you have a conda (or mamba) environment called (for example) myenv, you can add it as an option in the JupyterLab kernel dropdown by logging on to the cluster in a terminal as normal and executing the following commands (this may also work in a JupyterLab terminal).

module load python
source activate myenv
pip install ipykernel
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Replace myenv with the actual name of your environment. The string after --display-name can be anything you want.

Load environment modules?

In the previous ssh-tunnel method of using Jupyter, you could include module load statements in the Slurm batch script that started you jupyter lab (or notebook) job. For example, you could include module load gcc hdf5 to give access to the HDF5 libraries and command line tools.

With the new server, you can still customise the environment of your notebook, but the approach is different. You now need to create a startup script for a particular Jupyter kernel.

For example, if your notebook uses the kernel myenv (see above), you need to find and modify the definition of this kernel (a .json file, by default under $HOME/.local/share/jupyter/kernels/) to set up the environment for this kernel.

For details see the following instructions from NERSC: https://docs.nersc.gov/services/jupyter/how-to-guides/#how-to-customize-a-kernel-with-a-helper-shell-script

See the slurm log file?

Each Jupyterhub slurm job writes a log to ${HOME}/.jupyterhub_logs. You might want to remove old logs from this directory occasionally.

Ask a question not on this list, or report a problem with Jupyterhub?

For non urgent questions (almost all questions are non-urgent!), please use the CICA github issue page. This is much more efficient than email. If you want, you can tag @apcooper in your question and add the "Jupyter" tag. Please also help others with their questions if you know the answer!