Singularity - BenLangmead/jhu-compute GitHub Wiki

We often use Singularity to run our Docker containers on HPC systems (e.g. MARCC, JHPCE, Stampede2).

This is typically the only way we can run containers on HPCs since Singularity doesn't require root/sudo to start a container.

Singularity from Docker images

Singularity can pull from a Docker repo (e.g. quay.io) and build a Singularity version of the Docker container image.

example: singularity pull docker://quay.io/benlangmead/recount-rs5:0.9.0

This will either generate a .simg file if on the older Singularity version MARCC runs (2.6.0-dist) or it will be a .sif file is using the newer Singularity version which JHPCE runs (3.3.0-rc.1.72.g58018d3).

Newer versions of Singularity are backward compatible with the older .simg files---I'd guess the other way doesn't work but I haven't tried it.

MARCC specific issues

If you try to build/run singularity on the langmead-bigmem* nodes, it will fail. It's not setup properly at this time 2019-09-10.

Use the login nodes (primarily bc-login02) to build docker/singularity images that can then be run on the compute nodes. Or, request a compute node and build there.

If you try to build an image from a docker repo on bc-login03 after loading singularity you may see problems such as:

Exploding layer: sha256:27b1fa8a1a104c45d87908cb490dbf1ecfb412869d5e8c203f3f8c44b71989a3.tar.gz
ERROR  : tar extraction error: Write failed
WARNING: Warning handling tar header: Write failed

resulting in a non-trivially smaller *.simg file.

Use bc-login02 to build docker/singularity images if you get this error.

Singularity and [S]GE

I've noticed after a recent upgrade to Centos 7 on JHPCE that it's become more difficult to run Singularity containers under grid-engine (qrsh) sessions.

Grid-engine appears to be limiting the memory in such a way that it causes Singularity to crash unless started with a very large amount (~100GBs to build a docker=>singularity image). This didn't appear to be the case before the upgrade to Centos 7.

It looks like this was probably a factor of 1) the image building needing a lot of memory and 2) JHPCE's 0 tolerance for attempting to use any more threads/memory than requested.

The point being, if you're not building an image, just running one, the memory requirements appears to be a lot lower (57 GBs vs. 100 GBs).

For now, if you continue to see problems running singularity on an grid engine environment (e.g. JHPCE), try requesting larger amounts of memory via -l h_vmem=xG where x is the number of gigs in virtual memory multiplied by the number of threads you request via -pe local N, where N is the number of threads you need.

An example qrsh which should work to run the Monorail pipeline image with 3 jobs on 8 threads each:

qrsh -pe local 30 -l h_vmem=2G,mem_free=1.9G,h_fsize=100G

This will schedule a node with >= 30 cores free and >=57 GBs of memory.