Containerization - beeldengeluid/dane-example-worker GitHub Wiki

Building the image

docker build -t dane-example-worker .

Running a container locally (for testing)

docker run \
    --mount type=bind,source="$(pwd)"/config,target=/root/.DANE \
    --mount type=bind,source="$(pwd)"/data,target=/data \
    --rm \
    dane-example-worker --run-test-file

Explanation of the options:

--mount binds a local directory to a target directory within the container, so the container can read/write from your filesystem
"$(pwd)" corresponds to the current directory you are in. This is required because Docker uses the absolute paths when copying files to/from a container
Of course, "$(pwd)"/config and "$(pwd)"/data should correspond to actual, existing directories. You can change the paths to match your local filesystem. File permission issues may occur, because the owner of files created within a container is not the local user.

Specify all options before calling the image name, otherwise they will be treated as arguments to entrypoint.sh.

Make sure to build the image every time the code is changed to propagate the changes to the Docker image.

Running the container locally using CUDA compatible GPU

For workers that require AI models, such as Whisper, it makes sense to utilize the GPU instead of the CPU as it is far faster and more optimized. And CUDA dependencies will most likely be needed to run those models. But, depending on the CUDA version, the base image might differ. The base image that ensures you have the required CUDA dependencies can be found here: https://hub.docker.com/r/nvidia/cuda/tags

A tag can look like this:

12.4.1-runtime-ubuntu22.04

Where 12.4.1 is the CUDA driver version, runtime corresponds to the "flavor" used (in this case, it includes the CUDA runtime (cudart), the CUDA math libraries, and NCCL. For more details, check the "Overview of Images" from the official CUDA Docker image page), and ubuntu22.04 corresponds to the image's OS version to be used.

It is recommended to only change the CUDA driver version as the flavor and OS contain most of the dependencies that would be needed to run your AI model.

Additionally, you will need to install Python within the image. Add this to the Dockerfile after the first line:

RUN apt-get update && \
    apt-get install -y python3-pip python3-dev python-is-python3 && \
    rm -rf /var/lib/apt/lists/*

Once the Dockerfile is set up, if you have a CUDA compatible GPU and want to test your worker locally using Docker, you also need to configure the NVIDIA Container Toolkit within WSL. To do so, follow the installation steps and configure the Docker runtime.

Additionally, some issues have been noticed where the GPU would still not be used, even though it should have been. To fix it, Docker has been separately installed within the WSL Ubuntu environment, as so:

Make sure that Docker Desktop is not running on your Windows machine
Next, install docker.io in WSL:

sudo apt-get install docker.io

Follow the steps here to make sure it is working properly
Rebuild the image from within WSL

Essentially we install a new Docker daemon on WSL that isn't linked to Windows' Docker Desktop. Whenever you run this version, make sure that Docker Desktop isn't running or deactivate support for the WSL instance through the settings of Docker Desktop. Otherwise, the installations are conflicting and you will be getting unexpected behavior.

Lastly, set DEVICE in WHISPER_ASR_SETTINGS of config.yml to cuda, then build the image (as described above) and run the following command (from within the repository folder):

docker run \
    --gpus=all \
    --mount type=bind,source="$(pwd)"/config,target=/root/.DANE \
    --mount type=bind,source="$(pwd)"/data,target=/data \
    --rm \
    dane-example-worker --run-test-file

Known issues

We have had troubles earlier with file permissions of the entrypoint, resulting in the following error message when running a container:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "./docker-entrypoint.sh": permission denied: unknown.

This issue can be fixed with:

chmod +x docker-entrypoint.sh(and push this change to the Github repository)

File permission issues may also emerge with the input/output folders: if they are created We tried to run with --user "$(id -u):$(id -g)" (telling docker to run the container with your current (local) user id) to prevents file permission issues, but that was problematic for mounting the config.

Also, we've had several problems with poetry installations within images. It can help to use a newer base image (higher Python version). Another option is to circumvent Poetry installation by exporting the dependencies to requirements.txt: see https://github.com/beeldengeluid/beng-lod-server/blob/main/Dockerfile for an example.