Ray in kubernetes - Monadical-SAS/Morpheus GitHub Wiki

To deploy a ray cluster in Kubernetes we have created a custom Helm chart here.

This Helm chart creates a Kubernetes resource RayService. It is the main ray configuration file in the project.

The RayService documentation is available in the Ray official docs.

Below, we are going to explain the main parts of the configuration:

General configuration

These settings enable the cluster to check everything is working well before starting to run any task. It evaluates after 180 seconds, in another case it re-deploy the cluster head node.

  serviceUnhealthySecondThreshold: 180
  deploymentUnhealthySecondThreshold: 180

enableInTreeAutoscaling, enable the internal autoscaling in the cluster. It creates a sidecar pod in the head node, you will need to stay tuned about the available resources in the head node because it will use more CPU/RAM as well.

enableInTreeAutoscaling: true

Head nodes

In the rayStartParams block is important to set correctly the number of CPUs and GPUs. Normally, you could configure all the resources available, but if you are using the Autoscaling feature the number of CPUs should be available - 1. That is because the auto scaler sidecar reserves 1 CPU for it.

rayStartParams:
   port: '6379'
   dashboard-host: '0.0.0.0'
   num-cpus: "{{ .Values.worker.head.rayStartParams.numCpus }}"
   num-gpus: "{{ .Values.worker.head.rayStartParams.numGpus }}"

nodeSelector is an important setting if you need to deploy the cluster in a specific group of instances. In the default codebase, it will use "gpu-adv" tag.

nodeSelector:
   morpheus-type: "{{ .Values.worker.headNodeSelector }}"

Workers

In the workerGroupSpecs block you can define the number of replicas you want to have in your cluster. It doesn't mean you must have the number you are saying, but if the machine with the resources is available it will create a new replica there. It's possible to define the minimum number and the maximum one.

workerGroupSpecs:
  - replicas: {{ .Values.worker.workerGroup.replicas }}
    minReplicas: {{ .Values.worker.workerGroup.minReplicas }}
    maxReplicas: {{ .Values.worker.workerGroup.maxReplicas }}

The most important thing related to the worker configuration is to set the limits block with all the available resources in the machines. If your instance has 10 CPUs and 2 GPUs, those values should be in that block. Ray is super smart and it will be able to manage correctly those resources.

containers:
  - name: ray-worker
    image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
    imagePullPolicy: IfNotPresent
    lifecycle:
       preStop:
          exec:
             command: ["/bin/sh","-c","ray stop"]
    resources:
       limits:
          cpu: "{{ .Values.worker.workerGroup.resources.limits.cpu }}"
          nvidia.com/gpu: "{{ .Values.worker.workerGroup.resources.limits.gpu }}"