Kubernetes (Optional) - eunki-7/llm-rdma-mlops-lab GitHub Wiki
Kubernetes (Optional)
Deploy vLLM on a Kubernetes cluster for scalable serving.
📦 Deployment
kubectl apply -f 40-k8s-optional/vllm-deploy.yaml
kubectl apply -f 40-k8s-optional/vllm-service.yaml
This creates 4 vLLM pods and exposes them via a LoadBalancer service.
⚙️ Requirements
- Kubernetes cluster with GPU nodes
- NVIDIA Device Plugin or GPU Operator installed
🔧 Notes
- Adjust replicas in
vllm-deploy.yaml
based on available GPUs - Update
hostPath
for/models
to match your storage