CINECA ENROOT Quick Start Guide - gfiameni/nvdoc-italy GitHub Wiki

Usage

Usage: enroot COMMAND [ARG...]

Command line utility for manipulating container sandboxes.

 Commands:
   batch  [options] [--] CONFIG [COMMAND] [ARG...]
   bundle [options] [--] IMAGE
   create [options] [--] IMAGE
   exec   [options] [--] PID COMMAND [ARG...]
   export [options] [--] NAME
   import [options] [--] URI
   list   [options]
   remove [options] [--] NAME...
   start  [options] [--] NAME|IMAGE [COMMAND] [ARG...]
   version

Commands

Refer to the official documentation.

Example

Create the environment (to do only once either on the login or compute node)

$ export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
$ export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
$ export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
$ export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all

# Import the PyTorch 21.11 image from NVIDIA GPU Cloud
$ enroot import -o pytorch.21.11-py3.sqsh "docker://@nvcr.io#nvidia/pytorch:21.11-py3"

# Create a container out of it
$ enroot create --name pytorch2111 pytorch.21.11-py3.sqsh
$ enroot list
pytorch2111

Install a package into the container (you need to access a compute node first)

$ export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
$ export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
$ export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
$ export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all

$ enroot start --root --env NVIDIA_DRIVER_CAPABILITIES --rw pytorch2111 

$root pip install mypackage
$root exit

SLURM job script example


#!/bin/bash
#SBATCH -A <account_name>
#SBATCH -p dgx_usr_prod
#SBATCH --time 08:00:00     # format: HH:MM:SS
#SBATCH -N 1                # 1 node
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1        # 1 gpus per node out of 8
#SBATCH --mem=128G          # memory per node out
#SBATCH --job-name=my_job
#SBATCH --mail-type=ALL
#SBATCH --cpus-per-task=32

#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

export ENROOT_CACHE_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-cache/group-$(id -g)
export ENROOT_DATA_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-data/user-$(id -u)
export ENROOT_RUNTIME_PATH=$CINECA_SCRATCH/enroot/tmp/enroot-runtime/user-$(id -u)
export ENROOT_MOUNT_HOME=y NVIDIA_DRIVER_CAPABILITIES=all

enroot start --mount /dgx_scratch/userexternal/code:/workspace/code --mount /raid/DATASETS_AI/epic-kitchens/epic-kitchens-100:/workspace/data/ek-100 --root --env NVIDIA_DRIVER_CAPABILITIES --rw pytorch2111  sh -c 'cd /workspace/code; python train.py; wait'

Submit the job

$ sbatch submit_pytorch.sh

Remove the container

$ enroot remove pytorch2111