1B Install Mamba and Environments - NU-CPGME/sl_workshop_2024 GitHub Wiki
July / August, 2024
Developed by:
Egon A. Ozer, MD PhD ([email protected])
Ramon Lorenzo Redondo, PhD ([email protected])
Mamba (a C++ implementation of the python-based Conda software) is a software package manager that allows you to easily install much of the software required for this workshop on your computer. Software is installed into "environments" that can be activated and deactivated from the command line. By setting up Mamba/Conda environments you can have multiple versions of the same software on one computer and avoid conflicts between different versions of software packages or incompatible software. This will also allow you to easily install software and any other pre-requisite programs that are needed to run that software in one step. Another major advantage of Mamba/Conda (and a reason while we'll be using it for this workshop) is that you can be sure that everyone is using the same version of each software application regardless of when they download it and what computer they are using (good for reproducibility).
For a nice introduction to the basic functionality and commands of Mamba/Conda (essentially the same for both), see this tutorial or here for more detail.
For this workshop we'll be using micromamba as it is very stippped down, easy to install, and fast version of Mamba.
Installing micromamba should be a snap: (This command automatically picks all the default options)
sudo apt install curl
"${SHELL}" < <(curl -L micro.mamba.pm/install.sh)
source ~/.bashrc
Enter the following commands, either one at a time or cut and paste all of them into your terminal. The order of the commands is important, though. For more information, check out Bioconda
micromamba config append channels defaults
micromamba config append channels bioconda
micromamba config append channels conda-forge
micromamba config set channel_priority strict
Note, the commands for adding channels and setting channel_priority are slightly different between micromamba and conda. The commands above are specific for micromamba and probably will give you an error in conda.
Mamba environments can be set up by 1) manually creating a new environment and then adding software packages to the environment one at a time, 2) listing the software you want added to the environment when you create it, or 3) by using a specially formatted environment.yml
file to create the environnment and install all the correct packages in one step. If you want more detail about setting up Conda and Mamba environments, take a look here.
Example Option 1: We could start by creating an environment we will name "ws_test1" using the -n
option, activating the environment, and then installing a single software package called "circlator" from the bioconda channel (using -c
) with its dependencies in that environment.
micromamba create -n ws_test1
micromamba activate ws_test1
micromamba install -c bioconda circlator
Circlator is a program that can be used to circularise genome assemblies that we're just using for demonstration purposes here.
When you are done using the environment, you can exit it using the micromamba deactivate
command to return to your base environment.
Example Option 2: You can also create an environment and install software in the environment in a single step like so:
micromamba create -c bioconda -n ws_test2 circlator
New software can be installed in an existig environment by activating the environment and then using the micromamba install
command.
As a warning, the more software packages you install in an environemnt, the more risk you'll have that conflicts will arise. Sometimes it can take a very long time for Mamba to resolve those conflicts or else it can't resolve them at all. Just be aware.
Example Option 3: Environment files that contain lists of packages and other information can be used to quickly create environments. These are formatted as .yml configuration files.
Here is an example of a simple environment file we could save as a text file named "environment.yml":
name: ws_test3
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- circlator
To create an environment from the environemnt.yml file we would use the following command:
micromamba env create -f environment.yml
Other useful Mamba commands:
- To get a list of all of your installed mamba environemnts:
micromamba env list
- To remove a mamba environment (like the ws_test1 environment we created above):
micromamba remove -n ws_test1 --all
We are going to use software in mamba environments for several exercies in the rest of the workshop. Since they're relatively small and uncomplicated, we'll just create the environments manually here:
micromamba create -y \
-n assembly \
-c conda-forge \
-c bioconda \
-c defaults \
fastqc quast fastp spades
####In case of a weak internet connection, you may try the following codes in the order provided.
micromamba create -y -n assembly -c conda-forge -c bioconda -c defaults
micromamba activate assembly
micromamba install -y -n assembly -c bioconda fastqc
micromamba install -y -n assembly -c bioconda quast
micromamba install -y -n assembly -c bioconda fastp
micromamba install -y -n assembly -c bioconda spades
micromamba create -y \
-n annotation \
-c conda-forge \
-c bioconda \
-c defaults \
prokka=1.14.6
Click here for a note on MacOS installation
>
micromamba create -y \
-n alignment \
-c conda-forge \
-c bioconda \
-c defaults \
ivar snippy=4.6.0 snpeff=4
Click here for a note on MacOS installation
>
micromamba create -y \
-n phylogenetics \
-c conda-forge \
-c bioconda \
-c defaults \
treetime iqtree mafft
We are using Github right now for hosting the workshop documents and data, but Github's primary use is for software source code development and version control. Often newer versions of software will be available on Github than in Conda/Mamba or in other package managers such as APT. A folder containing source code and supporting documents for a piece of software is called a "repository."
Software or source code can be downloaded over the web from the Github website, but it's often more convenient to use the command line to get code from Github.
As an example, we're going to download the source code of Filtlong, a program for quality trimming long sequencing reads generated by Nanopore or PacBio platforms.
Below are the commands to "clone" (copy) the Github repository for Filtlong onto your computer and compile it.
mkdir ~/applications
cd ~/applications
git clone https://github.com/rrwick/Filtlong
cd Filtlong
make
bin/filtlong -h
Now we'll use git to download the data files we'll be using throughout the rest of the workshop.
cd ~
git clone https://github.com/NU-CPGME/sl_workshop_2024
cd ~/sl_workshop_2024
Using git
and GitHub for software development and version control could be the topic of a whole other workshop. Just know they are very powerful tools for software and other development.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.