02 Installing Anvi'o for Metagenomics Analysis - GeertsManon/EEG_Metagenomics GitHub Wiki

Introduction

This guide will walk you through installing Anvi'o on the VSC (Vlaams Supercomputer Centrum) infrastructure. If you're interested in having Anvi'o installed on your local machine, follow the instructions from official sources:

What you'll install

Conda

Conda is a package manager that helps you install and manage software environments. Think of it as an app store for scientific software. It keeps different tools organized and prevents conflicts between software versions. We will need this in order to download Anvi'o.

📚 Conda Documentation

Anvi'o

Anvi'o (Analysis and Visualization platform for 'omics data) is an interactive platform for genome binning and visualization. We'll use it to manually group assembled contigs into genome bins - essentially reconstructing individual genomes from mixed microbial communities.

🌐 Anvi'o Website | 📖 Anvi'o Tutorials

How you'll install (VSC)

Step 1: Install Miniconda

Download and install Miniconda (a lightweight version of conda):

# Download Miniconda installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Install to your data directory
bash Miniconda3-latest-Linux-x86_64.sh -b -p $VSC_DATA/miniconda3

If the installation command didn't work (error about the path), try using the full path instead of the $VSC_DATA variable:

# First, find your VSC user number
pwd
# This shows something like: /user/leuven/347/vsc34774

Your VSC user number is the last part (e.g., vsc34774). Now construct your full path:

  • Replace XXX with your faculty number (e.g., 347), which comes from the middle part of your home directory path shown by pwd
  • Replace vscXXXXX with your VSC username (e.g., vsc34774)
bash Miniconda3-latest-Linux-x86_64.sh -b -p /data/leuven/XXX/vscXXXXX/miniconda3

Example for user vsc34774:

bash Miniconda3-latest-Linux-x86_64.sh -b -p /data/leuven/347/vsc34774/miniconda3

Step 2: Configure your shell

Add conda to your PATH so you can use it from anywhere:

# Edit your bash configuration file
nano ~/.bashrc

# Add this line at the end:
export PATH="${VSC_DATA}/miniconda3/bin:${PATH}"

# Save and exit (Ctrl+X, then Y, then Enter)

# Reload your configuration
source ~/.bashrc

Step 3: Verify conda installation

# Check conda is accessible
which conda

# Check version
conda --version

Question: What version of conda did you install?

Step 4: Accept conda Terms of Service (if needed)

If you see an error about Terms of Service when creating environments, run:

conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

Step 5: Create Anvi'o environment

Create a dedicated environment for Anvi'o with Python 3.10:

conda deactivate
conda remove -n anvio-9 --all -y
conda create -y --name anvio-9 python=3.10 pip -c conda-forge

Step 6: Activate environment and install Anvi'o

# Activate the new environment
source activate anvio-9

# Download Anvi'o v9
curl -L https://github.com/merenlab/anvio/releases/download/v9/anvio-9.tar.gz \
    --output $VSC_DATA/anvio-9.tar.gz

# Install Anvi'o
pip install $VSC_DATA/anvio-9.tar.gz

# Install required dependencies
conda install -y -c conda-forge -c bioconda python=3.10 hmmer prodigal
conda install -c conda-forge nodejs

# Load preinstalled modules
module load DIAMOND

Step 7: Download databases

To assign taxonomy to our microbial genomes, we need a reference database of known organisms. We'll use the Genome Taxonomy Database (GTDB), which provides standardized taxonomic classifications based on genome phylogeny for bacteria and archaea. This database enables Anvi'o to compare single-copy core genes in your contigs against thousands of reference genomes to predict which organisms are present in your sample.

# The following command should only take a few seconds:
anvi-setup-scg-taxonomy --scgs-taxonomy-data-dir $VSC_DATA --num-threads 8

Step 8: Verify installation

# Check Anvi'o version
anvi-interactive --version

You should see:

Anvi'o .......................................: eunice (v9)                                                                                                                                                           
Python .......................................: 3.10.19                                                                                                                                                               

Profile database .............................: 40                                                                                                                                                                    
Contigs database .............................: 24                                                                                                                                                                    
Pan database .................................: 21                                                                                                                                                                    
Genome data storage ..........................: 7                                                                                                                                                                     
Structure database ...........................: 2                                                                                                                                                                     
Metabolic modules database ...................: 4                                                                                                                                                                     
tRNA-seq database ............................: 2                                                                                                                                                                     
Genes database ...............................: 6                                                                                                                                                                     
Auxiliary data storage .......................: 2                                                                                                                                                                     
Workflow configurations ......................: 4                                                                                                                                                                     

Troubleshooting

If you are experiencing problems, please contact:

📧 Manon Geerts: [email protected]

Please include:

  • Which system you're working on. If you're using a Mac, include information found in About This Mac.
  • The exact error message you're seeing
  • Which step you're stuck on