Kubernetes (Optional) - eunki-7/llm-rdma-mlops-lab GitHub Wiki

Kubernetes (Optional)

Deploy vLLM on a Kubernetes cluster for scalable serving.

📦 Deployment

kubectl apply -f 40-k8s-optional/vllm-deploy.yaml
kubectl apply -f 40-k8s-optional/vllm-service.yaml

This creates 4 vLLM pods and exposes them via a LoadBalancer service.

⚙️ Requirements

Kubernetes cluster with GPU nodes
NVIDIA Device Plugin or GPU Operator installed

🔧 Notes

Adjust replicas in vllm-deploy.yaml based on available GPUs
Update hostPath for /models to match your storage

🖼️ Architecture Diagram