Accessing GPU - ComputeCanada/ahep_interactive_analysis_facility GitHub Wiki

Driver installation

Once GPU nodes are provisioned, follow the instruction at https://docs.alliancecan.ca/wiki/Using_cloud_vGPUs to set up GPUs on the nodes. Please note that the NVidia driver version installed (as of 2023/03/07) is 11.4.

Not tested:

Alternatively, follow https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#package-manager to install the latest drivers.

Configure containerd to use Nvidia runtime

Follow instruction https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#id6

Testing installation

If you installed the Nvidida driver from Arbutus, https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-3-testing-the-installation will fail because the driver versions are mismatched. Try an image with cuda:11.4.0 will succeed.

ctr image pull docker.io/nvidia/cuda:11.4.0-base-ubi8
ctr run --rm -t --runc-binary=/usr/bin/nvidia-container-runtime --env NVIDIA_VISIBLE_DEVICES=all docker.io/nvidia/cuda:11.4.0-base-ubi8  cuda-11.4.0-base-ubi8 nvidia-smi

Tue Mar  7 20:15:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100D-8C       On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    560MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+