05 Install Mamba and Environments - NU-CPGME/quest_genomics_2025 GitHub Wiki
February, 2025
Egon A. Ozer, MD PhD ([email protected])
Ramon Lorenzo Redondo, PhD ([email protected])
Mamba (a C++ implementation of the python-based Conda software) is a software package manager that allows you to easily install much of the software required for this workshop on your computer. Software is installed into "environments" that can be activated and deactivated from the command line. By setting up Mamba/Conda environments you can have multiple versions of the same software on one computer and avoid conflicts between different versions of software packages or incompatible software. This will also allow you to easily install software and any other pre-requisite programs that are needed to run that software in one step. Another major advantage of Mamba/Conda (and a reason while we'll be using it for this workshop) is that you can be sure that everyone is using the same version of each software application regardless of when they download it and what computer they are using (good for reproducibility).
For a nice introduction to the basic functionality and commands of Mamba/Conda (essentially the same for both), see this tutorial or here for more detail.
If you ever find yourself wanting to run analyses on your own computer or a local workstation, you should install Mamba locally. You can either use micromamba which is a very stippped down, easy to install, and fast version of Mamba, or the full Mamba.
Do not install both micromamba and mamba! Pick one or the other.
Installing micromamba should be a snap:
(This command automatically picks all the default options)
sudo apt install curl "${SHELL}" < <(curl -L micro.mamba.pm/install.sh) source ~/.bashrc
Full Mamba/Conda installation instructions can be found on GitHub. This is a totally reasonable approach as it is more fully functioned than micromamba.
You only need to run these commands once. After you've set up conda this way it will always be available every time you log into Quest.
First, make sure you don't already have mamba set up on Quest. The following commands should give you no output. Let us know if either does so we can double-check that your installation is appropriate.
which conda
which mamba
If you got no output with the above commands, then run the following commands:
module load mamba
conda init
After this, either exit and log back in to Quest or type source ~/.bashrc
If your command prompt now looks like this
(base)[abc123@quser32 ~]$
then run the following command to keep Conda from starting its base environment every time you log in:conda config --set auto_activate_base false
Enter the following commands, either one at a time or cut and paste all of them into your terminal. The order of the commands is important, though. For more information, check out Bioconda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
Note, the commands for adding channels and setting channel_priority are slightly different between micromamba and conda. The commands above are specific for conda and probably will give you an error in micromamba. In micromamba you'll have to replace
--add
withappend
.
Mamba environments can be set up by 1) manually creating a new environment and then adding software packages to the environment one at a time, 2) listing the software you want added to the environment when you create it, or 3) by using a specially formatted environment.yml
file to create the environnment and install all the correct packages in one step. If you want more detail about setting up Conda and Mamba environments, take a look here.
Example Option 1: We could start by creating an environment we will name "ws_test1" using the -n
option, activating the environment, and then installing a single software package called "circlator" from the bioconda channel (using -c
) with its dependencies in that environment.
mamba create -n ws_test1
mamba activate ws_test1
mamba install -c bioconda circlator
Circlator is a program that can be used to circularise genome assemblies that we're just using for demonstration purposes here.
When you are done using the environment, you can exit it using the conda deactivate
command to return to your base environment.
Example Option 2: You can also create an environment and install software in the environment in a single step like so:
mamba create -c bioconda -n ws_test2 circlator
New software can be installed in an existig environment by activating the environment and then using the mamba install
command.
As a warning, the more software packages you install in an environemnt, the more risk you'll have that conflicts will arise. Sometimes it can take a very long time for Mamba to resolve those conflicts or else it can't resolve them at all. Just be aware.
Example Option 3: Environment files that contain lists of packages and other information can be used to quickly create environments. These are formatted as .yaml configuration files.
Here is an example of a simple environment file we could save as a text file named "environment.yaml":
name: ws_test3
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- circlator
To create an environment from the environemnt.yml file we would use the following command. Note when installing from an environment file we have to use the env create
function instead of just create
in the examples above:
mamba env create -f environment.yml
Other useful Mamba commands:
- To get a list of all of your installed mamba environemnts:
conda env list
- To remove a mamba environment (like the ws_test1 environment we created above):
conda remove -n ws_test1 --all
We are going to use software in mamba environments for several exercies in the rest of the workshop. These have been preinstalled in our shared project folder found here: /projects/p30002/condaenvs
. To activate a conda environment that is not found at the default location in your home folder, you have to give the full or relative path to the environment. For example:
conda activate /projects/p30002/condaenvs/alignment_env
We are using Github right now for hosting the workshop documents, but Github's primary use is for software source code development and version control. Often newer versions of software will be available on Github than in Conda/Mamba or in other package managers such as APT. A folder containing source code and supporting documents for a piece of software is called a "repository."
Software or source code can be downloaded over the web from the Github website, but it's often more convenient to use the command line to get code from Github.
As an example, we're going to download the source code of Filtlong, a program for quality trimming long sequencing reads generated by Nanopore or PacBio platforms.
Below are the commands to "clone" (copy) the Github repository for Filtlong onto your computer and compile it.
mkdir ~/applications
cd ~/applications
git clone https://github.com/rrwick/Filtlong
cd Filtlong
make
bin/filtlong -h
Using git
and GitHub for software development and version control could be the topic of a whole other workshop. Just know they are very powerful tools for software and other development.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.