Compute Worker Management Setup - codalab/codabench GitHub Wiki

Setup compute worker

Overview

Compute workers are simply machines that are able to accept/send celery messages on the port used by the broker URL you wish to connect to that have a compute worker image, or other software to receive submissions. This means that you can add computing power to your competitions or benchmarks if needed! Any computer, from your own physical machines to virtual machines on cloud computing services can be used for this purpose. You can add multiple workers to a queue to process several submissions simultaneously.

To use Podman, go to compute worker setup with Podman documentation.

To use Docker, follow these instructions below:

Steps:

  • Have a machine (either physical or virtual, 100 GB storage recommended)
  • Install Docker
  • Pull Compute Worker Image
  • Run the compute worker via Docker

Install Docker

Either:

a) Install docker via the installation script: https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-using-the-convenience-script

curl https://get.docker.com | sudo sh
sudo usermod -aG docker $USER

b) Install manually, following the steps at: https://docs.docker.com/install/

Pull Compute Worker Image

On the compute worker machine, run the following command in a shell:

docker pull codalab/competitions-v2-compute-worker

That will pull the latest image for the v2 worker. For specific versions, see the docker hub page at: https://hub.docker.com/r/codalab/competitions-v2-compute-worker/tags

Start CPU worker

Make a file .env and put this in it:

# Queue URL
BROKER_URL=<desired broker URL>

# Location to store submissions/cache -- absolute path!
HOST_DIRECTORY=/codabench

# If SSL isn't enabled, then comment or remove the following line
BROKER_USE_SSL=True

Remarks:

  • The broker URL is a unique identifier of the job queue that the worker should listen to. To create a queue or obtain the broker URL of an existing queue, you can refer to Queue Management wiki page.

  • /codabench -- this path needs to be volumed into /codabench on the worker, as you can see below. You can select another location if convenient.

Create a docker-compose.yml file and paste the following content in it:

# Codabench Worker
services:
    worker:
        image: codalab/competitions-v2-compute-worker:latest
        container_name: compute_worker
        volumes:
            - /codabench:/codabench
            - /var/run/docker.sock:/var/run/docker.sock
        env_file:
            - .env
        restart: unless-stopped
        logging:
            options:
                max-size: 50m
                max-file: 3

You can then launch the worker by running this command in the terminal where the docker-compose.yml file is located:

docker compose up -d

Alternately, you can use the docker run below:

docker run \
    -v /codabench:/codabench \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -d \
    --env-file .env \
    --name compute_worker \
    --restart unless-stopped \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    codalab/competitions-v2-compute-worker:latest

Start GPU worker

NVIDIA toolkit (new method)

Nvidia toolkit installation instructions

Once you install and configure the NVIDIA container toolkit, you can create a docker-compose.yml file with the following content:

# Codabench GPU worker (NVIDIA)
services:
    worker:
        image: codalab/competitions-v2-compute-worker:gpu
        container_name: compute_worker
        volumes:
            - /codabench:/codabench
            - /var/run/docker.sock:/var/run/docker.sock
        env_file:
            - .env
        restart: unless-stopped
        logging:
            options:
                max-size: 50m
                max-file: 3
        runtime: nvidia
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities:
                              - gpu

You can then launch the worker by running this command in the terminal where the docker-compose.yml file is located:

docker compose up -d

NVIDIA-docker Wrapper (old method)

Nvidia installation instructions

nvidia-docker run \
    -v /codabench:/codabench \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /var/lib/nvidia-docker/nvidia-docker.sock:/var/lib/nvidia-docker/nvidia-docker.sock \
    -d \
    --env-file .env \
    --name compute_worker \
    --restart unless-stopped \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    codalab/competitions-v2-compute-worker:gpu

Note that a competition docker image including CUDA and other GPU libraries, such as codalab/codalab-legacy:gpu, is then required.

Check logs

Use the following command to check logs and ensure everything is working fine:

docker logs -f compute_worker

Cleaning up periodically

It is recommended to clean up docker images and containers regularly to avoid filling up the storage.

  1. Run the following command:
sudo crontab -e
  1. Add the following line:
@daily docker system prune -af

Keep track of the worker

It is recommended to store the docker container hostname to identify the worker. This way, it is easier to troubleshoot issues when having multiple workers in one queue. To get the hostname, simply run docker ps and look at the key CONTAINER ID at the beginning of the output:

$ docker ps
CONTAINER ID   IMAGE                                           COMMAND                  CREATED      STATUS      PORTS     NAMES
1a2b3d4e5f67   codalab/competitions-v2-compute-worker:latest   "/bin/sh -c 'celery …"   3 days ago   Up 3 days             compute_worker

For each submission made to your queue, you can know what worker computed the ingestion and the scoring jobs in the server status page.


Optional: put data directly inside the compute worker

The folder $HOST_DIRECTORY/data, usually /codabench/data, is shared between the host (the compute worker) and the container running the submission (a new container is created for each submission). It is mounted inside the container as /app/data. This means that you can put data in your worker, in $HOST_DIRECTORY/data, so it can be read-only accessed during the job's process. You'll need to modify the scoring and/or ingestion programs accordingly, to points to /app/data. This is especially useful if you work with confidential data, or with a heavy dataset.

/!\ If you have several workers in your queue, remember to have the data accessible for each one.


If you simply wish to set up some compute workers to increase the computing power of your benchmark, you don't need to scroll this page any further.


Building compute worker

This is helpful only if you want to build the compute worker image. It is not needed if you simply want to set up compute workers to run submissions.

To build the normal image:

docker build -t codalab/competitions-v2-compute-worker:latest -f Dockerfile.compute_worker .

To build the GPU version:

docker build -t codalab/competitions-v2-compute-worker:gpu -f Dockerfile.compute_worker_gpu .

To update the image (add tag :latest, :gpu or else if needed)

docker push codalab/competitions-v2-compute-worker

If you have running compute workers, you'll need to pull again the image and to restart the workers to take into account the changes.

Worker management

Outside of docker containers install Fabric like so:

pip install fab-classic==1.17.0

Create a server_config.yaml in the root of this repository using:

cp server_config_sample.yaml server_config.yaml

Below is an example server_config.yaml that defines 2 roles comp-gpu and comp-cpu, one with GPU style workers (is_gpu and the GPU docker_image) and one with CPU style workers

comp-gpu:
  hosts:
    - [email protected]
    - [email protected]
  broker_url: pyamqp://user:pass@host:port/vhost-gpu
  is_gpu: true
  docker_image: codalab/competitions-v2-compute-worker:gpu

comp-cpu:
  hosts:
    - [email protected]
  broker_url: pyamqp://user:pass@host:port/vhost-cpu
  is_gpu: false
  docker_image: codalab/competitions-v2-compute-worker:latest

You can of course create your own docker_image and specify it here.

You can execute commands against a role:

❯ fab -R comp-gpu status
..
[[email protected]] out: CONTAINER ID        IMAGE                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
[[email protected]] out: 1d318268bee1        codalab/competitions-v2-compute-worker:gpu   "/bin/sh -c 'celery …"   2 hours ago         Up 2 hours                              hardcore_greider
..

❯ fab -R comp-gpu update
..
(updates workers)

See available commands with fab -l

Update docker image

If the compute worker docker image was updated, you can reflect the changes using the following commands.

Check no job is running:

docker ps

Update the worker:

docker stop compute_worker
docker rm compute_worker
docker pull codalab/competitions-v2-compute-worker:latest    # or other relevant docker image
docker run \                                                 # or docker compose up -d
    -v /codabench:/codabench \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -d \
    --env-file .env \
    --name compute_worker \
    --restart unless-stopped \
    --log-opt max-size=50m \
    --log-opt max-file=3 \
    codalab/competitions-v2-compute-worker:latest            # or other relevant docker image