Nvidia and k3s - elraro/home-ops GitHub Wiki
Support for Nvidia runtime v2 is added as of k3s 1.22. Therefore, be sure to use this version or higher.
Install Nvidia drivers
apt search nvidia-driver
Now, the last driver for my Ubuntu 20.04 is the 470 version, so...
sudo apt install nvidia-headless-470-server
The "headless" and "server" is for avoid install X11 things.
Install Nvidia container toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-runtime nvidia-utils-470-server libnvidia-encode-470-server
containerd test
sudo ctr image pull docker.io/nvidia/cuda:11.0.3-base-ubuntu20.04
sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0.3-base-ubuntu20.04 cuda-11.0.3-base-ubuntu20.04 nvidia-smi
The output will be something like this:
Thu May 26 15:36:18 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P1000 Off | 00000000:0A:00.0 Off | N/A |
| 30% 44C P0 N/A / N/A | 0MiB / 4040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Pod test
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-vector-add
image: "nvidia/samples:vectoradd-cuda11.2.1"
resources:
limits:
nvidia.com/gpu: 1
kubectl apply -f gpu-operator-test.yaml
The runtimeClass nvidia is defined here.