Draft RADPS Hub How to Deploy - casangi/RADPS GitHub Wiki

These instructions are a first draft on how to deploy the RADPS Hub that consists of JupyterHub, Dask Kubernetes Operator, and Prefect v3 on a bare-metal K3s cluster with Longhorn storage already configured and an accessible LDAP server. Access to the RADPS hub is limited to the NRAO internal network.

This deployment has a couple of limitations:

  • To access the different services, external IPs and node ports are used, which for our system are dynamically assigned. Consequently, if the external IPs are changed, the Hub will no longer work. In the future, these instructions will have to be modified so that proper domains are used and an ingress controller such as Traefik.
  • The login page of the Jupyter Hub is https but makes use of a self-signed certificate and should be replaced with a trusted certificate.
  • The egress control between the services is too permissive, and cluster roles are most probably too expansive. This will have to be tightened.
  • We should consider pinning Helm chart versions for reproducible deployments.

These limitations are deemed acceptable since the RADPS Hub deployment is non-production (only used for demonstration and experimentation).

Additional future work is to adapt these instructions to work on the commercial cloud.

All configuration YAML files can be found in the repo at RADPS Hub Charts.

** To Do: Add comprehensive health checks, monitoring, and debug notes.**

Setup Environment

This guide will do everything in the radps-hub namespace. If you are trying to redeploy it, choose a different namespace.

export KUBECONFIG=~/.kube/rasps-k3s.yaml
kubectl create namespace radps-hub
helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
helm repo add dask https://helm.dask.org/
helm repo add prefect https://prefecthq.github.io/prefect-helm
helm repo update

Self-Signed SSL Certificate Generation and Kubernetes Seceret Creation

JupyterHub will be exposed via HTTPS. For this internal deployment, a self-signed certificate will be generated using OpenSSL. To ensure browser validation without requiring DNS, the certificate's Common Name (CN) and Subject Alternative Name (SAN) will be set to one of the node's external IP addresses. The list of external IP's can be seen by looking at the Traefik service when you run kubectl get services -n kube-system. In this guide, this IP will be demarcated by x.x.x.x (please replace it).

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes \ -subj "/CN=x.x.x.x" \ -addext "subjectAltName=IP:x.x.x.x"

This will create a cert.pem and key.pem (keep these files safe). Now, create a Kubernetes secret using these files (this secret will be used in the Jupyter Hub yaml)

kubectl create secret tls jupyterhub-tls --cert=cert.pem --key=key.pem -n radps-hub

The name of the secret is jupyterhub-tls.

Create Shared Volume

To make sharing data between Jupyter accounts easy, we can create a shared persistent volume on Longhorn. The configuration yaml:

# file: common-storage-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jupyterhub-common-storage
  namespace: radps-hub
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Ti # Adjust size as needed

Remember to change the namespace:radps-hub when testing and setting up the deployment. Apply the manifest to create the PVC in the cluster:

kubectl apply -f common-storage-pvc.yaml

To check if the volume has been created, you can port forward the longhorn-frontend:

kubectl port-forward -n longhorn-system svc/longhorn-frontend 8294:80

The longhorn dashboard can then be accessed at 127.0.0.1:8294.

Role-Based Access Control

Needs review to ensure it is being done responsibly.

We need to give Jupyter and Dask the correct permissions to create dask clusters. We do this by defining roles (rbac: role-based access control):

We need to give Jupyter and Dask the correct permissions to create dask clusters. We do this by defining roles (rbac: role-based access control):

# dask-jupyter-rbac.yaml (Complete Version)

# 1. The Service Account for the Jupyter pod
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dask-jupyter-role
  namespace: radps-hub
---
# 2. The Role for namespaced Dask and core resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: dask-jupyter-role
  namespace: radps-hub
rules:
  - apiGroups: ["kubernetes.dask.org"]
    resources:
      - "daskclusters"
      - "daskworkergroups"
      - "daskjobs"
      - "daskautoscalers"
    verbs: ["get", "list", "watch", "create", "delete", "patch"]
  - apiGroups: [""]
    resources:
      - "pods"
      - "pods/log"
      - "services"
    verbs: ["get", "list", "watch", "create", "delete", "patch"]
---
# 3. The RoleBinding for the namespaced Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dask-jupyter-binding
  namespace: radps-hub
subjects:
  - kind: ServiceAccount
    name: dask-jupyter-role
    namespace: radps-hub
roleRef:
  kind: Role
  name: dask-jupyter-role
  apiGroup: rbac.authorization.k8s.io
---
# 4. The ClusterRole to grant permission to list nodes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: dask-node-lister-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list", "watch"]
---
# 5. The ClusterRoleBinding to grant that permission to the Jupyter service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dask-jupyter-node-lister-binding
subjects:
- kind: ServiceAccount
  name: dask-jupyter-role
  namespace: radps-hub
roleRef:
  kind: ClusterRole
  name: dask-node-lister-role
  apiGroup: rbac.authorization.k8s.io

Now apply the roles:

kubectl apply -f dask-jupyter-rbac.yaml 

Egress Control

# jupyter-allow-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-egress
  namespace: radps-hub
spec:
  podSelector: {} # An empty podSelector selects all pods in the namespace
  policyTypes:
  - Egress
  egress:
  - {} # An empty egress rule allows traffic to all destinations

Now apply the egress rules:

kubectl apply -f jupyter-allow-egress.yaml

Deploy Jupyter Hub

Update with our custom image that has all the required packages.

With the prerequisites in place, the next step is to deploy JupyterHub using the official Zero to JupyterHub (Z2JH) Helm chart. The configuration yaml file:

# jupyterhub-values.yaml
hub:
  config:
    JupyterHub:
      authenticator_class: firstuseauthenticator.FirstUseAuthenticator
    Authenticator:
      admin_users:
        - jsteeb
    FirstUseAuthenticator:
      create_users: true
      auto_login: false
      password_validator:
        pattern: '^.{8,}$'  # optional: require 8+ character passwords
        message: 'Password must be at least 8 characters.'
    KubeSpawner:
      start_timeout: 300   # you already have this via singleuser.startTimeout
      http_timeout: 300    # ← bump the HTTP‐readiness timeout

  db:
    type: sqlite-pvc
    pvc:
      accessModes:
        - ReadWriteOnce
      storage: 5Gi
      storageClassName: longhorn

# ==============================================================================
# II. Storage Provisioning with Longhorn
# ==============================================================================
singleuser:
  startTimeout: 300 # Increase the timeout for starting single-user servers.
  serviceAccountName: dask-jupyter-role #Needed for Dask and Prefect
  image:
    name: ghcr.io/casangi/radps-jupyter-notebook
    tag: "v0.0.6"
    pullPolicy: IfNotPresent # Container image pull policy.
  extraEnv:
    PREFECT_API_URL: "http://prefect-server:4200/api"
  cpu:
    limit: 4
    guarantee: 1
  memory:
    limit: 4G
    guarantee: 512M
  storage:
    type: dynamic
    # Set the default size for each user's home directory PVC.
    capacity: 15Gi
    dynamic:
      storageClass: longhorn
    extraVolumes:
      common-data:
        name: common-data
        persistentVolumeClaim:
          claimName: radps-hub-pvc
    extraVolumeMounts:
      common-data:
        name: common-data
        mountPath: /home/jovyan/shared

# ==============================================================================
# III. Proxy and Networking Configuration
# ==============================================================================
# Configures the public-facing proxy to handle ingress traffic.
proxy:
  # Configure the Kubernetes Service for the proxy.
  service:
    # Expose the service on a port on each node in the cluster.
    type: NodePort
    # Define the specific NodePort for HTTPS traffic.
    nodePorts:
      #http: 30080 #Disabled
      https: 30443
  # Enable HTTPS termination at the proxy.
  https:
    enabled: true
     # Use a Kubernetes Secret to provide the TLS certificate and key.
    type: secret
    secret:
      name: jupyterhub-tls # Name of the secret containing the TLS certificate and key.


scheduling:
  userScheduler:
    enabled: true
  podPriority:
    enabled: true
  userPlaceholder:
    enabled: true
    replicas: 4
  userPods:
    nodeAffinity:
      matchNodePurpose: require

If you want to be a Jupyter admin, add your name to hub.config.Authenticator.admin_users. Now deploy using:

helm upgrade --install jupyterhub jupyterhub/jupyterhub --namespace radps-hub --values jupyterhub-values.yaml

Jupyter Hub can now be logged into at https://x.x.x.x:30443.

Deploy Kubernetes Operator

Not a lot of configuration is needed to deploy (the complicated part of getting the roles correct is already done):

helm upgrade --install dask-operator dask/dask-kubernetes-operator  --namespace radps-hub --set watchNamespace=radps-hub

Deploy Prefect

Remember to change server.uiConfig.prefectUiApiUrl.

# prefect-server-values.yaml

# Configure the main Kubernetes Service for the Prefect server.
service:
  # Expose the service via a NodePort.
  type: NodePort
  # Specify the port for the Prefect UI/API.
  # Note: The Prefect chart uses a different structure than JupyterHub.
  # The targetPort on the service will be 4200 (the default Prefect port),
  # and we map the external NodePort 30042 to it.
  port: 4200
  nodePort: 30042

# This section configures the Prefect server itself.
server:
  # Configure the UI to correctly locate the API server.
  uiConfig:
    # This is the crucial setting. It tells the Prefect UI (frontend)
    # what URL to use to communicate with the Prefect API (backend).
    # It must match the external access point. [2]
    prefectUiApiUrl: "http://x.x.x.x:30042/api"

# Configure the PostgreSQL database that backs the Prefect server.
# The chart uses a Bitnami PostgreSQL sub-chart.
postgresql:
  # Ensure the sub-chart is enabled.
  enabled: true
  # Configure persistence for the PostgreSQL database.
  persistence:
    enabled: true
    # Explicitly use the 'longhorn' StorageClass for the database volume.
    storageClass: "longhorn"
    # Define the size of the database volume.
    size: 30Gi

Now deploy using:

helm upgrade --install prefect-server prefect/prefect-server --namespace radps-hub --values prefect-server-values.yaml