Kubernetes (Optional) - eunki-7/llm-rdma-mlops-lab GitHub Wiki

Kubernetes (Optional)

Deploy vLLM on a Kubernetes cluster for scalable serving.


📦 Deployment

kubectl apply -f 40-k8s-optional/vllm-deploy.yaml
kubectl apply -f 40-k8s-optional/vllm-service.yaml

This creates 4 vLLM pods and exposes them via a LoadBalancer service.


⚙️ Requirements

  • Kubernetes cluster with GPU nodes
  • NVIDIA Device Plugin or GPU Operator installed

🔧 Notes

  • Adjust replicas in vllm-deploy.yaml based on available GPUs
  • Update hostPath for /models to match your storage

🖼️ Architecture Diagram