Draft RADPS Hub How to Deploy - casangi/RADPS GitHub Wiki
These instructions are a first draft on how to deploy the RADPS Hub that consists of JupyterHub, Dask Kubernetes Operator, and Prefect v3 on a bare-metal K3s cluster with Longhorn storage already configured and an accessible LDAP server. Access to the RADPS hub is limited to the NRAO internal network.
This deployment has a couple of limitations:
- To access the different services, external IPs and node ports are used, which for our system are dynamically assigned. Consequently, if the external IPs are changed, the Hub will no longer work. In the future, these instructions will have to be modified so that proper domains are used and an ingress controller such as Traefik.
- The login page of the Jupyter Hub is https but makes use of a self-signed certificate and should be replaced with a trusted certificate.
- The egress control between the services is too permissive, and cluster roles are most probably too expansive. This will have to be tightened.
- We should consider pinning Helm chart versions for reproducible deployments.
These limitations are deemed acceptable since the RADPS Hub deployment is non-production (only used for demonstration and experimentation).
Additional future work is to adapt these instructions to work on the commercial cloud.
All configuration YAML files can be found in the repo at RADPS Hub Charts.
** To Do: Add comprehensive health checks, monitoring, and debug notes.**
Setup Environment
This guide will do everything in the radps-hub namespace. If you are trying to redeploy it, choose a different namespace.
export KUBECONFIG=~/.kube/rasps-k3s.yaml
kubectl create namespace radps-hub
helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
helm repo add dask https://helm.dask.org/
helm repo add prefect https://prefecthq.github.io/prefect-helm
helm repo update
Self-Signed SSL Certificate Generation and Kubernetes Seceret Creation
JupyterHub will be exposed via HTTPS. For this internal deployment, a self-signed certificate will be generated using OpenSSL. To ensure browser validation without requiring DNS, the certificate's Common Name (CN) and Subject Alternative Name (SAN) will be set to one of the node's external IP addresses. The list of external IP's can be seen by looking at the Traefik service when you run kubectl get services -n kube-system. In this guide, this IP will be demarcated by x.x.x.x (please replace it).
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes \ -subj "/CN=x.x.x.x" \ -addext "subjectAltName=IP:x.x.x.x"
This will create a cert.pem and key.pem (keep these files safe). Now, create a Kubernetes secret using these files (this secret will be used in the Jupyter Hub yaml)
kubectl create secret tls jupyterhub-tls --cert=cert.pem --key=key.pem -n radps-hub
The name of the secret is jupyterhub-tls.
Create Shared Volume
To make sharing data between Jupyter accounts easy, we can create a shared persistent volume on Longhorn. The configuration yaml:
# file: common-storage-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jupyterhub-common-storage
namespace: radps-hub
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn
resources:
requests:
storage: 1Ti # Adjust size as needed
Remember to change the namespace:radps-hub when testing and setting up the deployment. Apply the manifest to create the PVC in the cluster:
kubectl apply -f common-storage-pvc.yaml
To check if the volume has been created, you can port forward the longhorn-frontend:
kubectl port-forward -n longhorn-system svc/longhorn-frontend 8294:80
The longhorn dashboard can then be accessed at 127.0.0.1:8294.
Role-Based Access Control
Needs review to ensure it is being done responsibly.
We need to give Jupyter and Dask the correct permissions to create dask clusters. We do this by defining roles (rbac: role-based access control):
We need to give Jupyter and Dask the correct permissions to create dask clusters. We do this by defining roles (rbac: role-based access control):
# dask-jupyter-rbac.yaml (Complete Version)
# 1. The Service Account for the Jupyter pod
apiVersion: v1
kind: ServiceAccount
metadata:
name: dask-jupyter-role
namespace: radps-hub
---
# 2. The Role for namespaced Dask and core resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: dask-jupyter-role
namespace: radps-hub
rules:
- apiGroups: ["kubernetes.dask.org"]
resources:
- "daskclusters"
- "daskworkergroups"
- "daskjobs"
- "daskautoscalers"
verbs: ["get", "list", "watch", "create", "delete", "patch"]
- apiGroups: [""]
resources:
- "pods"
- "pods/log"
- "services"
verbs: ["get", "list", "watch", "create", "delete", "patch"]
---
# 3. The RoleBinding for the namespaced Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: dask-jupyter-binding
namespace: radps-hub
subjects:
- kind: ServiceAccount
name: dask-jupyter-role
namespace: radps-hub
roleRef:
kind: Role
name: dask-jupyter-role
apiGroup: rbac.authorization.k8s.io
---
# 4. The ClusterRole to grant permission to list nodes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: dask-node-lister-role
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list", "watch"]
---
# 5. The ClusterRoleBinding to grant that permission to the Jupyter service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: dask-jupyter-node-lister-binding
subjects:
- kind: ServiceAccount
name: dask-jupyter-role
namespace: radps-hub
roleRef:
kind: ClusterRole
name: dask-node-lister-role
apiGroup: rbac.authorization.k8s.io
Now apply the roles:
kubectl apply -f dask-jupyter-rbac.yaml
Egress Control
# jupyter-allow-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-egress
namespace: radps-hub
spec:
podSelector: {} # An empty podSelector selects all pods in the namespace
policyTypes:
- Egress
egress:
- {} # An empty egress rule allows traffic to all destinations
Now apply the egress rules:
kubectl apply -f jupyter-allow-egress.yaml
Deploy Jupyter Hub
Update with our custom image that has all the required packages.
With the prerequisites in place, the next step is to deploy JupyterHub using the official Zero to JupyterHub (Z2JH) Helm chart. The configuration yaml file:
# jupyterhub-values.yaml
hub:
config:
JupyterHub:
authenticator_class: firstuseauthenticator.FirstUseAuthenticator
Authenticator:
admin_users:
- jsteeb
FirstUseAuthenticator:
create_users: true
auto_login: false
password_validator:
pattern: '^.{8,}$' # optional: require 8+ character passwords
message: 'Password must be at least 8 characters.'
KubeSpawner:
start_timeout: 300 # you already have this via singleuser.startTimeout
http_timeout: 300 # ← bump the HTTP‐readiness timeout
db:
type: sqlite-pvc
pvc:
accessModes:
- ReadWriteOnce
storage: 5Gi
storageClassName: longhorn
# ==============================================================================
# II. Storage Provisioning with Longhorn
# ==============================================================================
singleuser:
startTimeout: 300 # Increase the timeout for starting single-user servers.
serviceAccountName: dask-jupyter-role #Needed for Dask and Prefect
image:
name: ghcr.io/casangi/radps-jupyter-notebook
tag: "v0.0.6"
pullPolicy: IfNotPresent # Container image pull policy.
extraEnv:
PREFECT_API_URL: "http://prefect-server:4200/api"
cpu:
limit: 4
guarantee: 1
memory:
limit: 4G
guarantee: 512M
storage:
type: dynamic
# Set the default size for each user's home directory PVC.
capacity: 15Gi
dynamic:
storageClass: longhorn
extraVolumes:
common-data:
name: common-data
persistentVolumeClaim:
claimName: radps-hub-pvc
extraVolumeMounts:
common-data:
name: common-data
mountPath: /home/jovyan/shared
# ==============================================================================
# III. Proxy and Networking Configuration
# ==============================================================================
# Configures the public-facing proxy to handle ingress traffic.
proxy:
# Configure the Kubernetes Service for the proxy.
service:
# Expose the service on a port on each node in the cluster.
type: NodePort
# Define the specific NodePort for HTTPS traffic.
nodePorts:
#http: 30080 #Disabled
https: 30443
# Enable HTTPS termination at the proxy.
https:
enabled: true
# Use a Kubernetes Secret to provide the TLS certificate and key.
type: secret
secret:
name: jupyterhub-tls # Name of the secret containing the TLS certificate and key.
scheduling:
userScheduler:
enabled: true
podPriority:
enabled: true
userPlaceholder:
enabled: true
replicas: 4
userPods:
nodeAffinity:
matchNodePurpose: require
If you want to be a Jupyter admin, add your name to hub.config.Authenticator.admin_users. Now deploy using:
helm upgrade --install jupyterhub jupyterhub/jupyterhub --namespace radps-hub --values jupyterhub-values.yaml
Jupyter Hub can now be logged into at https://x.x.x.x:30443.
Deploy Kubernetes Operator
Not a lot of configuration is needed to deploy (the complicated part of getting the roles correct is already done):
helm upgrade --install dask-operator dask/dask-kubernetes-operator --namespace radps-hub --set watchNamespace=radps-hub
Deploy Prefect
Remember to change server.uiConfig.prefectUiApiUrl.
# prefect-server-values.yaml
# Configure the main Kubernetes Service for the Prefect server.
service:
# Expose the service via a NodePort.
type: NodePort
# Specify the port for the Prefect UI/API.
# Note: The Prefect chart uses a different structure than JupyterHub.
# The targetPort on the service will be 4200 (the default Prefect port),
# and we map the external NodePort 30042 to it.
port: 4200
nodePort: 30042
# This section configures the Prefect server itself.
server:
# Configure the UI to correctly locate the API server.
uiConfig:
# This is the crucial setting. It tells the Prefect UI (frontend)
# what URL to use to communicate with the Prefect API (backend).
# It must match the external access point. [2]
prefectUiApiUrl: "http://x.x.x.x:30042/api"
# Configure the PostgreSQL database that backs the Prefect server.
# The chart uses a Bitnami PostgreSQL sub-chart.
postgresql:
# Ensure the sub-chart is enabled.
enabled: true
# Configure persistence for the PostgreSQL database.
persistence:
enabled: true
# Explicitly use the 'longhorn' StorageClass for the database volume.
storageClass: "longhorn"
# Define the size of the database volume.
size: 30Gi
Now deploy using:
helm upgrade --install prefect-server prefect/prefect-server --namespace radps-hub --values prefect-server-values.yaml