02 Installing Anvi'o for Metagenomics Analysis - GeertsManon/EEG_Metagenomics GitHub Wiki
Introduction
This guide will walk you through installing Anvi'o on the VSC (Vlaams Supercomputer Centrum) infrastructure. If you're interested in having Anvi'o installed on your local machine, follow the instructions from official sources:
- Miniconda installation:
- Anvi'o official installation guides:
What you'll install
Conda
Conda is a package manager that helps you install and manage software environments. Think of it as an app store for scientific software. It keeps different tools organized and prevents conflicts between software versions. We will need this in order to download Anvi'o.
Anvi'o
Anvi'o (Analysis and Visualization platform for 'omics data) is an interactive platform for genome binning and visualization. We'll use it to manually group assembled contigs into genome bins - essentially reconstructing individual genomes from mixed microbial communities.
🌐 Anvi'o Website | 📖 Anvi'o Tutorials
How you'll install (VSC)
Step 1: Install Miniconda
Download and install Miniconda (a lightweight version of conda):
# Download Miniconda installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Install to your data directory
bash Miniconda3-latest-Linux-x86_64.sh -b -p $VSC_DATA/miniconda3
If the installation command didn't work (error about the path), try using the full path instead of the $VSC_DATA variable:
# First, find your VSC user number
pwd
# This shows something like: /user/leuven/347/vsc34774
Your VSC user number is the last part (e.g., vsc34774). Now construct your full path:
- Replace XXX with your faculty number (e.g., 347), which comes from the middle part of your home directory path shown by
pwd - Replace vscXXXXX with your VSC username (e.g., vsc34774)
bash Miniconda3-latest-Linux-x86_64.sh -b -p /data/leuven/XXX/vscXXXXX/miniconda3
Example for user vsc34774:
bash Miniconda3-latest-Linux-x86_64.sh -b -p /data/leuven/347/vsc34774/miniconda3
Step 2: Configure your shell
Add conda to your PATH so you can use it from anywhere:
# Edit your bash configuration file
nano ~/.bashrc
# Add this line at the end:
export PATH="${VSC_DATA}/miniconda3/bin:${PATH}"
# Save and exit (Ctrl+X, then Y, then Enter)
# Reload your configuration
source ~/.bashrc
Step 3: Verify conda installation
# Check conda is accessible
which conda
# Check version
conda --version
❓ Question: What version of conda did you install?
Step 4: Accept conda Terms of Service (if needed)
If you see an error about Terms of Service when creating environments, run:
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
Step 5: Create Anvi'o environment
Create a dedicated environment for Anvi'o with Python 3.10:
conda deactivate
conda remove -n anvio-9 --all -y
conda create -y --name anvio-9 python=3.10 pip -c conda-forge
Step 6: Activate environment and install Anvi'o
# Activate the new environment
source activate anvio-9
# Download Anvi'o v9
curl -L https://github.com/merenlab/anvio/releases/download/v9/anvio-9.tar.gz \
--output $VSC_DATA/anvio-9.tar.gz
# Install Anvi'o
pip install $VSC_DATA/anvio-9.tar.gz
# Install required dependencies
conda install -y -c conda-forge -c bioconda python=3.10 hmmer prodigal
conda install -c conda-forge nodejs
# Load preinstalled modules
module load DIAMOND
Step 7: Download databases
To assign taxonomy to our microbial genomes, we need a reference database of known organisms. We'll use the Genome Taxonomy Database (GTDB), which provides standardized taxonomic classifications based on genome phylogeny for bacteria and archaea. This database enables Anvi'o to compare single-copy core genes in your contigs against thousands of reference genomes to predict which organisms are present in your sample.
# The following command should only take a few seconds:
anvi-setup-scg-taxonomy --scgs-taxonomy-data-dir $VSC_DATA --num-threads 8
Step 8: Verify installation
# Check Anvi'o version
anvi-interactive --version
You should see:
Anvi'o .......................................: eunice (v9)
Python .......................................: 3.10.19
Profile database .............................: 40
Contigs database .............................: 24
Pan database .................................: 21
Genome data storage ..........................: 7
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2
Genes database ...............................: 6
Auxiliary data storage .......................: 2
Workflow configurations ......................: 4
Troubleshooting
If you are experiencing problems, please contact:
📧 Manon Geerts: [email protected]
Please include:
- Which system you're working on. If you're using a Mac, include information found in
About This Mac. - The exact error message you're seeing
- Which step you're stuck on