InferX platform k8s deployment - inferx-net/inferx GitHub Wiki

System Requirement

  • OS: Linux Kernel > 5.8.0, it has been tested in Ubuntu 20.04, 22.4 and 24.04
  • Processor: X86-64/Amd64
  • Docker: > 17.09.0
  • KVM: bare metal or VM with nested virtualization, Enable virtualization technology in BIOS (Usually in Security tab of BIOS)
  • Memory: >=64GB
  • Cuda: >= 12.5
  • Kubenetes: K8S or K3S There are following container images:
    1. inferx/inferx_dashboard:v0.2.1: The inferx webui dashboard.
    2. inferx/inferx_platform:v0.2.1: The inferx platform services such as rest api gateway,scheduler, etc.
    3. inferx/inferx_na:v0.2.1: The inferx nodeagent and ixproxy service.
    4. inferx/inferx_postgres:v0.2.1: The inferx database services.
    5. quay.io/keycloak/keycloak:latest: Keycloak image which used for Authentication.
    6. quay.io/coreos/etcd:v3.5.13: Inferx configurations such as tenant, namespace and model functions.

Deploy InferX services

  1. Set up k3s cluster (skip if cluster is already set up)
curl -sfL https://get.k3s.io | sh -
#install k3s cluster

# on ubuntu 22.04 following is needed.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb [https://#deb](about:blank) [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] [https://#g](about:blank)' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=containerd

if ! command -v helm &> /dev/null; then
  echo "[+] Installing Helm..."
  curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
fi

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
sudo chmod 555 /etc/rancher/k3s/k3s.yaml
helm install --wait gpu-operator nvidia/gpu-operator   -n gpu-operator --create-namespace   --set driver.enabled=false

sudo systemctl restart k3s

#at this step describe node, "nvidia.com/gpu: {GPU count}" should show up in both Capacity and Allocatable section.
  1. Label the the GPU node
kubectl label node <NodeName> inferx_storage=data --overwrite
kubectl label node <NodeName> inferx_nodeType=inferx_file --overwrite
  1. clone inferx repo:
git clone https://github.com/inferx-net/inferx.git
cd inferx
  1. Adjust nodeagent parameters: update the k8s/nodeagent.yaml and k8s/ixproxy.yaml

3.1 Update memory size: the model container resources will be allocted fromt the nodeagent pod, please update the memory suitable size per node's memory size resources: requests: cpu: "20" memory: "180Gi" # Regular memory request (RAM) limits: cpu: "20" memory: "180Gi" # Regular memory request (RAM) nvidia.com/gpu: 1

3.2 Update cache size: the cache size is the memory size which cache model in cpu memory. It will take part of the memory size in 3.1. Please make sure it is less than 50% of memory size of 3.1 - name: CACHE_MEMORY value: "90Gi"

  1. Start the pods: in the inferx folder run
make runkblob
  1. Check the website http://:31250/demo/

Deploy models

  1. Use following command to download model (replace Qwen/Qwen2.5-Coder-1.5B-Instruct with the model you want to deploy)
sudo docker run --rm --network host -v /opt/inferx/cache:/models     inferx/inferx_hfdownload:v0.1.0 Qwen/Qwen2.5-Coder-1.5B-Instruct
  1. setup enviroment variable
export INFX_GATEWAY_URL="http://localhost:31501" # the inferx_one expose 31501 as the node port 
export IFERX_APIKEY="87831cdb-d07a-4dc1-9de0-fb232c9bf286" # this is admin apikey, it is configured in the nodeagent yaml
  1. submit first model
cd inferx/config
/opt/inferx/bin/ixctl create public.json # create tenant
/opt/inferx/bin/ixctl create Qwen_namespace.json # create namespace
/opt/inferx/bin/ixctl create Qwen2.5-Coder-1.5B-Instruct.json # create first model
⚠️ **GitHub.com Fallback** ⚠️