setup k3s in lxd - gvijqb/qbwiki GitHub Wiki
Steps to perform:
- Bump up the kernel limits on the Host machine:
sudo sysctl -n -w fs.inotify.max_user_instances=1048576
sudo sysctl -n -w fs.inotify.max_queued_events=1048576
sudo sysctl -n -w fs.inotify.max_user_watches=1048576
sudo sysctl -n -w vm.max_map_count=262144
-
Launch LXD container and add the following in container profile: https://www.qblocks.cloud/host/lxc/k3d-profile-example.txt
-
Create rc.local inside the container and add the following content:
sudo vim /etc/rc.local
#!/bin/bash
apparmor_parser --replace /var/lib/snapd/apparmor/profiles/snap.microk8s.*
exit 0
sudo chmod +x /etc/rc.local
- Reboot the Container
More details here: https://ubuntu.com/blog/running-kubernetes-inside-lxd
The above steps should make the container ready for K3D support
Once the Host and Container have been configured to support Kubernetes inside LXD, k3d can be installed in the container to run kubernetes.
Install K3D in LXD container:
1. Install kubectl:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
2. Install K3D:
wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
Steps to bring up K3s cluster inside LXC Container
- K3s is a edge based kubernetes cluster
1. Make sure nvidia-smi
is running inside the container
2. Install nvidia-container-toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
3. Make sure default runtime for docker is nvidia
. Confirm using below command
cat /etc/docker/daemon.json
sudo systemctl restart docker
4. Now, we will run K3s cluster using docker runtime. By default, k3s prefers containerd runtime. But for GPU to work we need default runtime of nvidia, which from above steps we have installed in docker
sudo curl -sfL https://get.k3s.io | sh -s - --docker
5. Make sure k3s cluster is up and running
sudo k3s kubectl get pods --all-namespaces
6. Install NVIDIA daemon for K3s*. This makes host GPU available for k3s cluster
sudo k3s kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
7. Do check logs of nvidia-device-plugin to confirm GPU are detected*
sudo k3s kubectl logs <daemon set pod name> -n kube-system
0812 05:23:47.267089 1 main.go:154] Starting FS watcher.
I0812 05:23:47.267213 1 main.go:161] Starting OS watcher.
I0812 05:23:47.267548 1 main.go:176] Starting Plugins.
I0812 05:23:47.267563 1 main.go:234] Loading configuration.
I0812 05:23:47.267689 1 main.go:242] Updating config with default resource matching patterns.
I0812 05:23:47.267884 1 main.go:253]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": false,
"nvidiaDriverRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
}
}
I0812 05:23:47.267893 1 main.go:256] Retreiving plugins.
I0812 05:23:47.268313 1 factory.go:107] Detected NVML platform: found NVML library
I0812 05:23:47.268378 1 factory.go:107] Detected non-Tegra platform: /sys/devices/soc0/family file not found
I0812 05:23:47.279615 1 server.go:165] Starting GRPC server for 'nvidia.com/gpu'
I0812 05:23:47.280859 1 server.go:117] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0812 05:23:47.283115 1 server.go:125] Registered device plugin for 'nvidia.com/gpu' with Kubelet
8. Validate is GPU is detected by K3s cluster node
sudo k3s kubectl describe node -A | grep nvidia
9. If you are able to see GPU recognised and deamonSet not throwing an error. its time to do a test run and make sure a pod can access the GPU. Make sure to run this container only on a node with GPU.
- Create a .yaml file
gputest.yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.2.1-ubuntu18.04
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
-
Depending on CUDA version running inside container. You need to pick the appropriate image from https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/cuda-sample/tags to test
-
Run the gpu pod
sudo k3s kubectl apply -f gputest.yaml
sudo k3s kubectl logs gpu-pod
- On successful run, the logs will look like below
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
**This confirms K3s cluster was able to detect GPU and pods are able to run code on GPU inside kubernetes cluster