Docker


`docker run hello-world`	sanity check
`docker ps [-a]`	list containers
`docker image ls`	list images
`docker run -it \<image-name\>:\<tag\>`	Run command prompt withing an image
`docker commit <commitid> <newimagename>`	Where commit id can be taken from `docker ps`
`docker start <container name> && docker exec -it <container name> <command>`	Restart and run a command within a container

Run without sudo:

https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo

# Add the docker group if it doesn't already exist:
sudo groupadd docker

# Add the connected user "$USER" to the docker group. Change the user name to match
# your preferred user if you do not want to use your current user:
sudo gpasswd -a $USER docker

Either do a newgrp docker or log out/in to activate the changes to groups.

docker run hello-world

to check if can run docker without sudo.

Get graphical access using VNC

http://blog.fx.lv/2017/08/running-gui-apps-in-docker-containers-using-vnc/

nvidia container toolkit

Taken from here.

First - make sure that nvidia driver is installed and recognizes the gpu (e.g. by running nvidia-smi).

$ distribution=$(. /etc/os-release;echo  $ID$VERSION_ID)  

# NOTE: apt-key is deprecated and will produce a warning as of Ubuntu 22.04. Will need to modify this to use gpg command instead
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -  
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Install nvidia-docker2:

apt-get update
apt-get install -y nvidia-docker2
sudo systemctl restart docker

Run a base image

docker run -it --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

Image alternatives:

base: minimal option with essential cuda runtime
runtime: more fully-featured option that includes the CUDA math libraries and NCCL for cross-GPU communication
devel: everything from runtime as well as headers and development tools for creating custom CUDA images

Can then use the image as the base in the dockerfile

FROM nvidia/cuda:11.4.0-base-ubuntu20.04
RUN apt update
RUN apt-get install -y python3 python3-pip
RUN pip install tensorflow-gpu
COPY tensor-code.py .
ENTRYPONT ["python3", "tensor-code.py"]

If need to use a different base, can manually add cuda support, see link above / https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container/64422438#64422438

Cleaml

Setting up a server (using docker): https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac + bringing it up.

Server starts on http://localhost:8080. Go to the profile page (right top button or http://localhost:8080/profile) ==> add credentials ==> copy as input into clearml-init (below).

Locally:

pip install clearml
clearml-init

LSF


`bsub`	Submit job. Can either provide full arguments or a `.bsub` script file.
`bjobs`	List user jobs. `bjobs -l <job id>` display details about job. Use `-w` or `-W` for untruncated output.
`bkill -l <job id>`	Kill job
`battach -L /bin/bash <job id>`	Attach to running interactive session.
`blimits -u <username>`	Check compute resource quota for user.
`bqueues`, `qstat`	Shoe available queues and their running / pending job counts.
`btop`	Move a pending job to top of (per user) scheduling order.
`bpeek <job id>`	view stdout from job. `-f` uses tail on the output.

Modules

command	module	description
`iquota`, `quota_advisor`	`quota`
`ncdu`
`mc`, `tmux`, `gcc`, `boost`, `cuda`, `conda`

Other utilities

command	description
`/usr/lpp/mmfs/bin/mmlsquota -j <drive> --block-size G rng-gpu01` `/usr/lpp/mmfs/bin/mmlsattr -L <drive>`	Check quota

docker, cluster, mlops - feliyur/exercises GitHub Wiki

Docker

Run without sudo:

Get graphical access using VNC

nvidia container toolkit

Cleaml

LSF

Modules

Other utilities

⚠️ GitHub.com Fallback ⚠️

docker, cluster, mlops - feliyur/exercises GitHub Wiki

Docker

Run without sudo:

Get graphical access using VNC

nvidia container toolkit

Cleaml

LSF

Modules

Other utilities

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️