Daskhub installation - gcp4hep/analysis-cluster GitHub Wiki

Installation

This documentation will refer to our installation process of Dask Gateway + JupyterHub on a Kubernetes backend, in particular using the daskhub helm chart. The official daskhub installation instructions are here.

Helm and other pre-requisites have to be followed from the official documentation. The file values.yaml contains our desired configuration.

helm upgrade --wait --install --render-subchart-notes dhub dask/daskhub --values=values.yaml

Explanations for relevant sections in our `values.yaml` configuration

Tokens

The tokens are generated using:

openssl rand -hex 32

JupyterHub

JupyterHub is connected to ATLAS IAM. Users are created automatically the first time they connect.
We can define the default and a list of alternative images. More details on images in the Daskhub image section
CVMFS can be installed on the cluster following these instructions and then mounted to the Jupyter pods

Dask gateway

Traefik service: note the service for dask-gateway is marked as LoadBalancer. This is to be able to connect to Dask Gateway directly from outside the K8s cluster.
We have added CVMFS to the workers in the Dask cluster as well, although I believe it might be required in the Jupyter pod only.
optionHandler: by default Dask Gateway does not allow users to overwrite worker size (CPU, memory) or image configuration. This has to be enabled explicitly.

Optional Node Pool settings

It can be recommendable to run the Dask setup with multiple node pools with different quality of service:

Main node pool with standard (non-preemptible) nodes. At the moment it's defined to be static. This is the default pool and will host the critical pods that do not tolerate interruption well:
- Jupyter Hub: necessary for running the Jupyter service
- Jupyter Notebook: it's not nice to kill the notebook while a user is working on it
- Dask Gateway: necessary for running the Dask Gateway interface
- Dask Scheduler: main component of each Dask cluster. If we loose it, we loose track of the whole task.
- Dask Workers: can be scheduled to fill up the main node pool. Excess will spill over to the Worker node pool.
Worker node pool with preemptible nodes. This node pool has a taint and only pods that tolerate it will be scheduled to the cluster. Dask has fault tolerance built in. If a worker pod is lost, the scheduler will replace it with a new worker pod. The work done by the worker is lost, but not the whole task.

Worker Node Pool

In order to set it up:

Create the second node pool enabling preemptible nodes (if you want) and your desired autoscaling limits. Under metadata add the taint NoSchedule for dask=worker. The name is arbitrary, but it has to match the toleration defined for the worker pods in the values.yaml template.
Add the toleration under the worker configuration in the values.yaml configuration. Our sample configuration already includes the toleration.
Edit the CVMFS daemonset to include and add the following toleration snippet inside the CVMFS container template. Otherwise those nodes will not mount CVMFS (you might not care).

$ kubectl edit daemonset cvmfs-nodeplugin -n cvmfs
...
      tolerations:
      - effect: NoSchedule
        key: dask
        operator: Equal
        value: worker

Autoscaling for this node pool works only if you define pod disruption budgets:

kubectl create poddisruptionbudget konnectivity-agent --namespace=kube-system --selector k8s-app=konnectivity-agent --max-unavailable 1
kubectl create poddisruptionbudget kube-dns --namespace=kube-system --selector k8s-app=kube-dns --max-unavailable 1
kubectl create poddisruptionbudget event-exporter-gke --namespace=kube-system --selector k8s-app=event-exporter-gke --max-unavailable 1
kubectl create poddisruptionbudget metrics-server --namespace=kube-system --selector k8s-app=metrics-server --max-unavailable 1
kubectl create poddisruptionbudget konnectivity-agent-autoscaler --namespace=kube-system --selector k8s-app=konnectivity-agent-autoscaler --max-unavailable 1

Critical GPU node pool. You can add node pools with different GPU flavours to your K8S cluster. Since these nodes are started exclusively for a user, it takes around 10 minutes to be ready (start VM and add it to the K8S cluster, install CVMFS, download the large-ish ML image). For the GPU profiles you need to include the necessary changes:
- GKE will only schedule pods to these nodes when the resource requests/limits specify they need an NVIDIA GPU.
- Use node selectors to direct the pods to specific GPUs families.
Worker GPU node pool can be added in order to run Dask clusters with multiple GPUs. At the time of this writing, we do not have a lot of experience with the usage Tensorflow across a Dask cluster.