Daskhub usage - gcp4hep/analysis-cluster GitHub Wiki
Below you will find information on how to use our Dask Gateway instance, including basic snippets to get you started. You can either use it from JupyterHub or directly through python.
Our JupyterHub instance is available under http://jupyter.gcp4hep.org. Authentication is through ATLAS IAM, so anybody in ATLAS should be able to connect.
Each notebook runs on an independent pod. When starting up JupyterHub, the user will be given a selection of available images.
The user on the notebook is jovyan (the inhabitants of Jupiter) and the home directory is /home/jovyan.
The home directory is mapped to a private, persistent 10GB disk for personal files. When appropriate, the user disk can be expanded to a larger size by the administrator.
Users can install additional packages in their home directory, e.g. through pip install --user <package>.
Anything outside the home directory is cleaned up when the notebook is stopped.
The following figure illustrates a basic example.
- You connect to the gateway, create a cluster and get the client. You have to scale the cluster in order to start up a worker. to the required size.
from dask_gateway import GatewayCluster
cluster = GatewayCluster(worker_cores=1, worker_memory=2, image="xxx/yyy:zzz")
cluster.scale(1)
client = cluster.get_client()
If you want to run a Dask GPU cluster, please add the optional num_gpus=1 argument (set it to 1, other numbers are not supported).
cluster = GatewayCluster(worker_cores=1, worker_memory=2, image="xxx/yyy:zzz", num_gpus=1)
📝 Instructions on the available images are in the image section.
It's also possible to create a cluster by clicking on CLUSTERS ... +NEW on the left panels. However
this will generate a LOCAL cluster, i.e. living in your personal Jupyter pod.
-
You will get a
DashboardURL. Copy paste it (including the IP/hostname part) to the top-left Dask widget, if you want to enable fancy displays. -
Run your computation, in this example interact with a dask array.
-
Shutdown the cluster. If you don't shut down the cluster, it will stay around occupying resources until you disconnect.
cluster.shutdown()

⚠️ Please use the system carefully and don't leave idle clusters around
Also note that (not shown in the previous image) you can interact with the existing clusters. You don't need a new cluster each time.
from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()
cluster = gateway.connect(clusters[0].name)
# RUN YOUR COMPUTATION
cluster.shutdown()
In order to delete all your old clusters, you can run:
from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()
for cluster in clusters:
print ("Stopping cluster {0}".format(cluster.name))
gateway.stop_cluster(cluster.name)
If you want to interact with DaskGateway directly through python:
- Generate a JupyterHub token from
http://<JupyterHub address>/hub/tokenand save it. - Then from your console connect to Dask. Dask is available from
http://dask.gcp4hep.org/. Do NOT use the host/IP for JupyterHub in this case.
export JUPYTERHUB_API_TOKEN=<YOUR TOKEN>
[user@machine gke-dask]# python3
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dask_gateway import Gateway
>>> gateway = Gateway("http://dask.gcp4hep.org/services/dask-gateway", auth='jupyterhub')
>>> cluster = gateway.new_cluster(image='xxx/yyy:zzz')
>>> client = cluster.get_client()
>>> # RUN YOUR COMPUTATION
📝 Remember that client (wherever you are running python) and the worker image need to match and have compatible libraries installed. Our images are listed here, but you can run any other image you want that matches your client.