System Requirement

OS: Linux Kernel > 5.8.0, it has been tested in Ubuntu 20.04, 22.4 and 24.04
Processor: X86-64/Amd64
Docker: > 17.09.0
KVM: bare metal or VM with nested virtualization, Enable virtualization technology in BIOS (Usually in Security tab of BIOS)
Memory: >=64GB
Cuda: >= 12.5
Kubenetes: K8S or K3S There are following container images:
1. inferx/inferx_dashboard:v0.2.1: The inferx webui dashboard.
2. inferx/inferx_platform:v0.2.1: The inferx platform services such as rest api gateway,scheduler, etc.
3. inferx/inferx_na:v0.2.1: The inferx nodeagent and ixproxy service.
4. inferx/inferx_postgres:v0.2.1: The inferx database services.
5. quay.io/keycloak/keycloak:latest: Keycloak image which used for Authentication.
6. quay.io/coreos/etcd:v3.5.13: Inferx configurations such as tenant, namespace and model functions.

Deploy InferX services

Set up k3s cluster (skip if cluster is already set up)

curl -sfL https://get.k3s.io | sh -
#install k3s cluster

# on ubuntu 22.04 following is needed.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=containerd

if ! command -v helm &> /dev/null; then
  echo "[+] Installing Helm..."
  curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
fi

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
sudo chmod 555 /etc/rancher/k3s/k3s.yaml
helm install --wait gpu-operator nvidia/gpu-operator   -n gpu-operator --create-namespace   --set driver.enabled=false

sudo systemctl restart k3s

#at this step describe node, "nvidia.com/gpu: {GPU count}" should show up in both Capacity and Allocatable section.

Label the the GPU node

kubectl label node <NodeName> inferx_storage=data --overwrite
kubectl label node <NodeName> inferx_nodeType=inferx_file --overwrite

clone inferx repo:

git clone https://github.com/inferx-net/inferx.git
cd inferx

Adjust nodeagent parameters: update the k8s/nodeagent.yaml and k8s/ixproxy.yaml

3.1 Update memory size: the model container resources will be allocted fromt the nodeagent pod, please update the memory suitable size per node's memory size resources: requests: cpu: "20" memory: "180Gi" # Regular memory request (RAM) limits: cpu: "20" memory: "180Gi" # Regular memory request (RAM) nvidia.com/gpu: 1

3.2 Update cache size: the cache size is the memory size which cache model in cpu memory. It will take part of the memory size in 3.1. Please make sure it is less than 50% of memory size of 3.1 - name: CACHE_MEMORY value: "90Gi"

Start the pods: in the inferx folder run

make runkblob

Check the website http://:31250/demo/

Deploy models

Use following command to download model (replace Qwen/Qwen2.5-Coder-1.5B-Instruct with the model you want to deploy)

sudo docker run --rm --network host -v /opt/inferx/cache:/models     inferx/inferx_hfdownload:v0.1.0 Qwen/Qwen2.5-Coder-1.5B-Instruct

setup enviroment variable

export INFX_GATEWAY_URL="http://localhost:31501" # the inferx_one expose 31501 as the node port 
export IFERX_APIKEY="87831cdb-d07a-4dc1-9de0-fb232c9bf286" # this is admin apikey, it is configured in the nodeagent yaml

submit first model

cd inferx/config
/opt/inferx/bin/ixctl create public.json # create tenant
/opt/inferx/bin/ixctl create Qwen_namespace.json # create namespace
/opt/inferx/bin/ixctl create Qwen2.5-Coder-1.5B-Instruct.json # create first model

InferX platform k8s deployment - inferx-net/inferx GitHub Wiki

System Requirement

Deploy InferX services

Deploy models

⚠️ GitHub.com Fallback ⚠️

InferX platform k8s deployment - inferx-net/inferx GitHub Wiki

System Requirement

Deploy InferX services

Deploy models

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️