DeepLabCut on LUNARC - NRC-Lund/multipark-aiml GitHub Wiki

DeepLabCut is a toolbox for markerless pose estimation used extensively in animal research. DeepLabCut can be difficult to install and slow to run on a regular computer, especially when training your own models. However, we have installed DeepLabCut on the LUNARC GPU cluster COSMOS, which is free to use for Lund University researchers.

Getting access to LUNARC

Please follow this guide to get access.

Testing DeepLabCut

LUNARC uses a module system to dynamically load pre-installed software. To load the DeepLabCut module, you need to open a terminal and issue the following commands:

ml GCC/11.3.0
ml OpenMPI/4.1.4
ml DeepLabCut/2.3.6-CUDA-11.7.0

To try if DeepLabCut is properly installed you can start Python:

python

and then, within Python, import the DeepLabCut module:

import deeplabcut

It should now tell you that DLC 2.3.6 is loaded.

Note that DLC is installed without a graphical user interface. You therefore need to label your images on a different computer and then copy your project to COSMOS.

Running DeepLabCut on an NVIDIA A100 GPU node

When running DLC directly in the terminal – like we did above – we run it on the "login" node, which does not give access to the GPU:s. To run it on the GPU:s, we need to submit the job through the resource manager SLURM. This is done in 3 steps:

Write a Python script that does what you want.
Write a shell script that calls the Python script and also sets up the appropriate environment (like loading the DLC module).
Execute the shell script with the sbatch command.

Here is an example Python script that I have named train.py:

import deeplabcut as dlc
config_path="/home/phalje/Documents/DLC/Test-PH-2023-09-05/config.yaml"
dlc.train_network(config_path, shuffle=1, saveiters=10000, maxiters=100000)

Below is the accompanying shell script. I have named it train.sh:

#!/bin/bash

#SBATCH -t 00:50:00
#SBATCH -o result_%j.out
#SBATCH -e result_%j.err
#SBATCH -p gpua100

ml GCC/11.3.0
ml OpenMPI/4.1.4
ml DeepLabCut/2.3.6-CUDA-11.7.0

python train.py

The lines starting with #SBATCH tells SLURM how to submit the job; see the LUNARC documentation for explanations.

To submit the job to SLURM, type

sbatch train.sh

The information that normally would be output to the terminal will instead be written to the files result_%j.out and result_%j.err, where %j is the ID number of the job.