CINECA ENROOT Quick Start Guide - gfiameni/nvdoc-italy GitHub Wiki
Usage
Usage: enroot COMMAND [ARG...]
Command line utility for manipulating container sandboxes.
Commands:
batch [options] [--] CONFIG [COMMAND] [ARG...]
bundle [options] [--] IMAGE
create [options] [--] IMAGE
exec [options] [--] PID COMMAND [ARG...]
export [options] [--] NAME
import [options] [--] URI
list [options]
remove [options] [--] NAME...
start [options] [--] NAME|IMAGE [COMMAND] [ARG...]
version
Commands
Refer to the official documentation.
Example
Create the environment (to do only once either on the login or compute node)
$ export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
$ export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
$ export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
$ export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all
# Import the PyTorch 21.11 image from NVIDIA GPU Cloud
$ enroot import -o pytorch.21.11-py3.sqsh "docker://@nvcr.io#nvidia/pytorch:21.11-py3"
# Create a container out of it
$ enroot create --name pytorch2111 pytorch.21.11-py3.sqsh
$ enroot list
pytorch2111
Install a package into the container (you need to access a compute node first)
$ export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
$ export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
$ export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
$ export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all
$ enroot start --root --env NVIDIA_DRIVER_CAPABILITIES --rw pytorch2111
$root pip install mypackage
$root exit
SLURM job script example
#!/bin/bash
#SBATCH -A <account_name>
#SBATCH -p dgx_usr_prod
#SBATCH --time 08:00:00 # format: HH:MM:SS
#SBATCH -N 1 # 1 node
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1 # 1 gpus per node out of 8
#SBATCH --mem=128G # memory per node out
#SBATCH --job-name=my_job
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=32
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all
enroot start --mount /dgx_scratch/userexternal/code:/workspace/code --mount /raid/DATASETS_AI/epic-kitchens/epic-kitchens-100:/workspace/data/ek-100 --root --env NVIDIA_DRIVER_CAPABILITIES --rw pytorch2111 sh -c 'cd /workspace/code; python train.py; wait'
Submit the job
$ sbatch submit_pytorch.sh
Remove the container
$ enroot remove pytorch2111