Caffe cudnn5 NVidia GPU Docker Container - Sotera/watchman GitHub Wiki

NVIdia and Docker

Unlike most Docker containers, NVidia GPU containers are tightly coupled to the specific version of the NVidia GPU drivers on the underlying operating system. Trying to keep these versions matched while building the container can be a frustrating exercise. NVidia's own installer conspires against you.

What Currently Works (2016-10-17)

A Ubuntu 14.04 AMI built with NVidia driver version 367.57. (Filename: NVIDIA-Linux-x86_64-367.57.run)

Details

configure_gpu.sh

The file watchman/services/configure_gpu.sh contains the detailed steps for configuring an Ubuntu 14.04 AMI with the above driver. Note that the driver file is also run INSIDE the Docker container with the --no-kernel-module flag to make shared libraries required by the build of the CUDA version of Caffe available.

Dockerfile-ubuntu-cudnn5

This creates the new Docker layer holding the necessary NVidia, Cuda, and NN5 parts that the GPU enabled Caffe container depends on.

Without GPU With GPU NVidia's Containers
FROM ubuntu:14.04 FROM ubuntu:14.04 FROM ubuntu:14.04
N/A Dockerfile-ubuntu-cudnn5 cuda:8.0-runtime
N/A (continued) cuda:8.0-devel
N/A (continued) cuda:8.0-cudnn
Dockerfile-caffe Dockerfile-caffe-cudnn5

Dockerfile-ubuntu-cudnn5 is almost entirely based on the three Dockerfiles provided by NVidia with the addition of the (partial) installation of the driver components to get additional needed shared libraries.

Dockerfile-caffe-cudnn5

This file is very similar to the non-GPU file Dockerfile-caffe. The differences are:

Dockerfile-caffe Dockerfile-caffe-cudnn5 Comment
FROM ubuntu:14.04 FROM sotera/ubuntu-cudnn5:1.0 use the Docker container that provides the GPU dependencies
make -j"$(nproc)" all make all Multi-threaded build fails on GPU AMI due to race condition
https://raw.githubusercontent.com/Sotera/ social-sandbox/event_detection/ firmament/caffe/Makefile.config https://raw.githubusercontent.com/Sotera/ watchman/cudnn5/ firmament/caffe/Makefile.config Build Caffe with GPU and NN5 libraries

Device mapping

The GPU Docker container requires additional flags:

--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/ddev/nvidia0 

Or the Firmament configuration must pass this information via the Docker API e.g.:

:
  {
    "name": "caffe",
    "Image": "sotera/watchman-caffe-cudnn5:2.0",
    "DockerFilePath": "",
    "Hostname": "caffe",
    "HostConfig": {
      "VolumesFrom": [
        "data-container"
      ],
      "Links": [
        "redis:redis"
      ],
      "Devices" : [
        { "PathOnHost": "/dev/nvidiactl", "PathInContainer": "/dev/nvidiactl", "CgroupPermissions": "mrw"},
        { "PathOnHost": "/dev/nvidia-uvm", "PathInContainer": "/dev/nvidia-uvm", "CgroupPermissions": "mrw"},
        { "PathOnHost": "/dev/nvidia0", "PathInContainer": "/dev/nvidia0", "CgroupPermissions": "mrw"}
      ]
    }
  },
:

NVidia Drivers

NVidia drivers are obtained from us.download.nvidia.com. You supposed to first register at: NVidia Developer. The following wget will retrieve the version 367.57 driver:

http://us.download.nvidia.com/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run

Note: One of the issues that I ran into is that the Ubuntu 14 repo has version 367.48 as the latest version, however NVidia has pulled this driver from their download site. The result is it is impossible to create a Docker container with shared libraries that match the resulting underlying OS version. You have to manually install the 367.57 version on the AMI and then in the container. Because this version is just slightly ahead of Ubuntu everything works and no spurious updates occur.

Note Corollary: There are many references to apt-get packages for NVidia drivers on the Internet. DO NOT USE THESE. Use the .run file.

Useful tools

  • nvidia-smi -- This tool shows GPU activity and can be used as a first test for proper configuration. It should run in the Docker container. It only shows useful results in the host because Docker processes are really host processes.
  • dmesg |grep -i "nvid|nvrm" -- Find NVidia related system messages
  • ls /usr/lib/x86_64-linux-gnu | grep libnvidia-ml -- Determine version of installed shared libraries

References