Caffe cudnn5 NVidia GPU Docker Container - Sotera/watchman GitHub Wiki
NVIdia and Docker
Unlike most Docker containers, NVidia GPU containers are tightly coupled to the specific version of the NVidia GPU drivers on the underlying operating system. Trying to keep these versions matched while building the container can be a frustrating exercise. NVidia's own installer conspires against you.
What Currently Works (2016-10-17)
A Ubuntu 14.04 AMI built with NVidia driver version 367.57. (Filename: NVIDIA-Linux-x86_64-367.57.run)
Details
configure_gpu.sh
The file watchman/services/configure_gpu.sh contains the detailed steps for configuring an Ubuntu 14.04 AMI with the above driver. Note that the driver file is also run INSIDE the Docker container with the --no-kernel-module flag to make shared libraries required by the build of the CUDA version of Caffe available.
Dockerfile-ubuntu-cudnn5
This creates the new Docker layer holding the necessary NVidia, Cuda, and NN5 parts that the GPU enabled Caffe container depends on.
Without GPU | With GPU | NVidia's Containers |
---|---|---|
FROM ubuntu:14.04 | FROM ubuntu:14.04 | FROM ubuntu:14.04 |
N/A | Dockerfile-ubuntu-cudnn5 | cuda:8.0-runtime |
N/A | (continued) | cuda:8.0-devel |
N/A | (continued) | cuda:8.0-cudnn |
Dockerfile-caffe | Dockerfile-caffe-cudnn5 |
Dockerfile-ubuntu-cudnn5 is almost entirely based on the three Dockerfiles provided by NVidia with the addition of the (partial) installation of the driver components to get additional needed shared libraries.
Dockerfile-caffe-cudnn5
This file is very similar to the non-GPU file Dockerfile-caffe. The differences are:
Dockerfile-caffe | Dockerfile-caffe-cudnn5 | Comment |
---|---|---|
FROM ubuntu:14.04 | FROM sotera/ubuntu-cudnn5:1.0 | use the Docker container that provides the GPU dependencies |
make -j"$(nproc)" all | make all | Multi-threaded build fails on GPU AMI due to race condition |
https://raw.githubusercontent.com/Sotera/ social-sandbox/event_detection/ firmament/caffe/Makefile.config | https://raw.githubusercontent.com/Sotera/ watchman/cudnn5/ firmament/caffe/Makefile.config | Build Caffe with GPU and NN5 libraries |
Device mapping
The GPU Docker container requires additional flags:
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/ddev/nvidia0
Or the Firmament configuration must pass this information via the Docker API e.g.:
:
{
"name": "caffe",
"Image": "sotera/watchman-caffe-cudnn5:2.0",
"DockerFilePath": "",
"Hostname": "caffe",
"HostConfig": {
"VolumesFrom": [
"data-container"
],
"Links": [
"redis:redis"
],
"Devices" : [
{ "PathOnHost": "/dev/nvidiactl", "PathInContainer": "/dev/nvidiactl", "CgroupPermissions": "mrw"},
{ "PathOnHost": "/dev/nvidia-uvm", "PathInContainer": "/dev/nvidia-uvm", "CgroupPermissions": "mrw"},
{ "PathOnHost": "/dev/nvidia0", "PathInContainer": "/dev/nvidia0", "CgroupPermissions": "mrw"}
]
}
},
:
NVidia Drivers
NVidia drivers are obtained from us.download.nvidia.com. You supposed to first register at: NVidia Developer. The following wget will retrieve the version 367.57 driver:
http://us.download.nvidia.com/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run
Note: One of the issues that I ran into is that the Ubuntu 14 repo has version 367.48 as the latest version, however NVidia has pulled this driver from their download site. The result is it is impossible to create a Docker container with shared libraries that match the resulting underlying OS version. You have to manually install the 367.57 version on the AMI and then in the container. Because this version is just slightly ahead of Ubuntu everything works and no spurious updates occur.
Note Corollary: There are many references to apt-get packages for NVidia drivers on the Internet. DO NOT USE THESE. Use the .run file.
Useful tools
- nvidia-smi -- This tool shows GPU activity and can be used as a first test for proper configuration. It should run in the Docker container. It only shows useful results in the host because Docker processes are really host processes.
- dmesg |grep -i "nvid|nvrm" -- Find NVidia related system messages
- ls /usr/lib/x86_64-linux-gnu | grep libnvidia-ml -- Determine version of installed shared libraries