Accessing GPU - ComputeCanada/ahep_interactive_analysis_facility GitHub Wiki
Driver installation
Once GPU nodes are provisioned, follow the instruction at https://docs.alliancecan.ca/wiki/Using_cloud_vGPUs to set up GPUs on the nodes. Please note that the NVidia driver version installed (as of 2023/03/07) is 11.4.
Not tested:
Alternatively, follow https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#package-manager to install the latest drivers.
Configure containerd to use Nvidia runtime
Follow instruction https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#id6
Testing installation
If you installed the Nvidida driver from Arbutus, https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-3-testing-the-installation will fail because the driver versions are mismatched. Try an image with cuda:11.4.0 will succeed.
ctr image pull docker.io/nvidia/cuda:11.4.0-base-ubi8
ctr run --rm -t --runc-binary=/usr/bin/nvidia-container-runtime --env NVIDIA_VISIBLE_DEVICES=all docker.io/nvidia/cuda:11.4.0-base-ubi8 cuda-11.4.0-base-ubi8 nvidia-smi
Tue Mar 7 20:15:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID V100D-8C On | 00000000:00:05.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 560MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+