04. Working with Singularity - davidaray/Genomes-and-Genome-Evolution GitHub Wiki

As mentioned at the bottom of the last exercise, the 2023 iteration of the class had difficulty with variation among students and their ability to use one of my favorite packages for installing software, Conda. Different people would end up installing different versions of software and others couldn't install anything at all. To resolve that problem, I've decided to try something new.

Singularity containers are individualized, self-contained operating systems that allow users to package and run software, data, libraries, and scientific workflows in a portable and reproducible way. In other words, one can build an operating system from scratch, set it up exactly the way you want it to work and then share it with others and it will work exactly the same way for them. Imagine a Singularity container as a box that holds all the software and its dependencies you need to run a specific task. Everything inside this box is isolated, so it works consistently no matter where you open it (on your laptop, an HPC cluster, etc.).

That's what I've done for this year's class in an attempt to get around the problems we experienced last year.

However, it requires a little bit of explanation.

Shells

First, let's talk about shells. A computer shell is a program that acts as an interface between a user and an operating system (OS). It allows users to interact with the computer by typing commands and executing them. A shell can be a command-line interface (CLI) or a graphical user interface (GUI). Some sources consider only CLI programs to be actual shells.

You've already worked with the shell extensively in the previous exercises. You used a shell called 'bash' to do all your work there. For most of the remaining work in this class, we'll use a combination of bash and the singularity container I've set up for you. There are two major ways to use the container. One is to use it as a shell and the other is to execute it using submission scripts. We'll do both.

Getting the container for this class.

Let's use the bash shell to get you the singularity container you'll be using for this class. Set up a directory for the class in your /lustre/scratch directory using the following steps.

interactive -p nocona # grab a processor for interactive work

cd /lustre/scratch/[your eraider] # navigate to your scratch directory

mkdir -p gge2024/container # create a directory for this class along with a subdirectory for the container to live in

cp /lustre/scratch/daray/gge2024/container/gge_container_v4.sif gge2024/container # copy the container file from my scratch gge2024 directory to yours

The file is quite large but that's to be expected since you're essentially copying another computer onto your section of HPCC. Regardless, you should now have a copy of the container available to you to use throughout the class. This is my first time trying to teach the class this way. As a result, I may need to make changes to the container if we run into problems. I'll try to keep the number of changes to a minimum to reduce confusion.

The singularity container is called gge_container_v4.sif and any time you want to use it, you'll need to tell the HPCC computers that.

In the previous exercise you learned about shell variables. We will use variables extensively to make interacting with the container a little easier.

Mounting directories

One thing that makes working with containers more confusing is the need to "mount" directories. Mounting directories for a Singularity container, especially for those new to high-performance computing (HPC), can be thought of as a way to "connect" parts of your computer (like folders) to the virtual environment where your software or tools are running. This lets the software inside the container access your files, just as if they were part of its own system.

What does "mounting" mean?

Simple Analogy: Think of mounting as making a specific folder on your computer available to the container. It’s like telling the box, “Hey, you can access this specific drawer outside the box whenever you need something from it.”

Technical Explanation: When you mount a directory, you link a folder on your computer (or the HPC system) to a path inside the container. This way, the software inside the container can read, write, and manipulate files in that folder as if they were part of the container’s internal storage.

Why do we mount directories?

Accessing Data: Often, your input data and output results are stored in directories outside the container. By mounting these directories, the container can process your data and save results without copying everything into and out of the container.

Saving Time and Space: It avoids the need to duplicate large files and helps you work efficiently by directly accessing the necessary files.

How to mount a directory

Command Example: When you run a Singularity container, you can mount directories using the -B or --bind option.

singularity exec -B /path/to/your/data:/data my_container.sif my_command

Breakdown of the command:

singularity exec # This tells Singularity to run (execute) a command using the shell in the container.

-B /path/to/your/data:/data # This mounts the directory /path/to/your/data from your computer to the /data directory inside the container.

my_container.sif # This is your Singularity container file.

my_command # This is the command you want to run using the container's shell.

Outcome:

After running this, any files in /path/to/your/data on your computer can be accessed from /data inside the container.

Things to keep in mind

Permissions: Make sure you have the right permissions to read or write in the directories you are mounting.

Paths Must Exist: The path you are mounting to inside the container (e.g., /data) must exist or be created within the container.

Summary

Think of mounting as connecting folders from your computer to the container.
It’s a way to share data between your computer and the container.
You mount directories so the container can access files outside its isolated environment.

By understanding mounting as simply "connecting folders" between your system and the container, it becomes easier to work with Singularity on HPC systems.

Let's run a command using the container

First, let's set up two variables, one variable will be a path to the container, the second will be a path to the directory we want to mount.

GGE_CONTAINER="/lustre/scratch/[eraider name]/gge2024/container"

WORKDIR="/lustre/scratch/[eraider name]/gge2024"

Next, we need to create the test_directory. See Things to keep in mind (above). Assuming you're still in your /lustre/scratch directory, you can do this with

mkdir -p $WORKDIR/test_directory

Now, let's run a simple command to count from 1-100 using the container rather than the native operating system. The output from our count will be saved in the test_directory.

singularity exec -B $WORKDIR:$WORKDIR $GGE_CONTAINER/gge_container_v4.sif bash -c "for i in {1..100}; do echo \$i >> $WORKDIR/test_directory/test_output.txt; done"

If it helps you to read this, you could also input the command as:

singularity exec \
-B $WORKDIR:$WORKDIR \
$GGE_CONTAINER \
bash -c "for i in {1..100}; do echo \$i >>$WORKDIR/test_directory/test_output.txt; done"

The "" marks at the end of the first three lines allow you to break up each section of the command so they're easier to distinguish from one another.

Let's break this command down as we did before.

singularity exec # This tells Singularity to run a command using the shell in the container.

-B $WORKDIR:$WORKDIR # This mounts the test_directory to a directory in the container with the same name and path. Thus, any output will go to that directory and you can use the same

$GGE_CONTAINER/gge_container_v4.sif # This is the path to our Singularity container file.

bash -c "for i in {1..100}; do echo \$i >>$WORKDIR/test_output.txt; done", is the command you want to run using the container's bash shell. It counts from 1-100 and prints the output to a file in the $WORKDIR/test_directory. Note that the entire set of commands ("for", "do echo", "done") is wrapped in quotes. This is because Singularity expects only one command per call of exec. Wrapping all of the commands in the quotation marks takes care of that.

Check to see if the expected output file is present in the output directory.

Using the container shell directly

For the above exercise, we used the exec command. You can also interact directly with the container's shell using the (surprise!) shell command.

You still need to mount the appropriate directory. But afterward, you can run any set of commands the same way you would in the native bash shell.

For example, here's how you would count from 1-100 in the native bash shell. Try it.

for i in {1..100}; do echo $i; done

Here's how you would do the same thing using the container's bash shell.

singularity shell -B $WORKDIR:$WORKDIR $GGE_CONTAINER/gge_container_v4.sif

Note the change in the command prompt from something like cpu-23-26:/lustre/scratch/daray$ to Singularity>.

for i in {1..100}; do echo $i; done

The screen output is the same using either method.

To get out of the Singularity shell, simply type exit and press enter.

There is nothing to submit for this exercise but keep in mind that understanding what a singularity container is and how it works is vital to moving forward in the class.