docker, cluster, mlops - feliyur/exercises GitHub Wiki

Docker

docker run hello-world sanity check
docker ps [-a] list containers
docker image ls list images
docker run -it \<image-name\>:\<tag\> Run command prompt withing an image
docker commit <commitid> <newimagename> Where commit id can be taken from docker ps
docker start <container name> && docker exec -it <container name> <command> Restart and run a command within a container

Run without sudo:

https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo

# Add the docker group if it doesn't already exist:
sudo groupadd docker

# Add the connected user "$USER" to the docker group. Change the user name to match
# your preferred user if you do not want to use your current user:
sudo gpasswd -a $USER docker

Either do a newgrp docker or log out/in to activate the changes to groups.

docker run hello-world

to check if can run docker without sudo.

Get graphical access using VNC

http://blog.fx.lv/2017/08/running-gui-apps-in-docker-containers-using-vnc/

nvidia container toolkit

Taken from here.

First - make sure that nvidia driver is installed and recognizes the gpu (e.g. by running nvidia-smi).

$ distribution=$(. /etc/os-release;echo  $ID$VERSION_ID)  

# NOTE: apt-key is deprecated and will produce a warning as of Ubuntu 22.04. Will need to modify this to use gpg command instead
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -  
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Install nvidia-docker2:

apt-get update
apt-get install -y nvidia-docker2
sudo systemctl restart docker

Run a base image

docker run -it --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

Image alternatives:

  • base: minimal option with essential cuda runtime
  • runtime: more fully-featured option that includes the CUDA math libraries and NCCL for cross-GPU communication
  • devel: everything from runtime as well as headers and development tools for creating custom CUDA images

Can then use the image as the base in the dockerfile

FROM nvidia/cuda:11.4.0-base-ubuntu20.04
RUN apt update
RUN apt-get install -y python3 python3-pip
RUN pip install tensorflow-gpu
COPY tensor-code.py .
ENTRYPONT ["python3", "tensor-code.py"]

If need to use a different base, can manually add cuda support, see link above / https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container/64422438#64422438

Cleaml

Setting up a server (using docker): https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac + bringing it up.

Server starts on http://localhost:8080. Go to the profile page (right top button or http://localhost:8080/profile) ==> add credentials ==> copy as input into clearml-init (below).

Locally:

pip install clearml
clearml-init

LSF

bsub Submit job. Can either provide full arguments or a .bsub script file.
bjobs List user jobs. bjobs -l <job id> display details about job. Use -w or -W for untruncated output.
bkill -l <job id> Kill job
battach -L /bin/bash <job id> Attach to running interactive session.
blimits -u <username> Check compute resource quota for user.
bqueues, qstat Shoe available queues and their running / pending job counts.
btop Move a pending job to top of (per user) scheduling order.
bpeek <job id> view stdout from job. -f uses tail on the output.

Modules

command module description
iquota, quota_advisor quota
ncdu
mc, tmux, gcc, boost, cuda, conda

Other utilities

command description
/usr/lpp/mmfs/bin/mmlsquota -j <drive> --block-size G rng-gpu01
/usr/lpp/mmfs/bin/mmlsattr -L <drive>
Check quota
⚠️ **GitHub.com Fallback** ⚠️