HumaNn - quadram-institute-bioscience/biobakery-2024 GitHub Wiki

What is Humann

Humann is a computational profiler allowing users to estimate the abundance of microbial metabolic pathways and gene families from metagenomic or metatranscriptomic sequencing data.

Running Humann3

Complete documentation for Humann3 can be found in the Biokakery wiki

Pre-requisites

Humann3 installation

Humann3 is installed on the QIB HPC system in several version. You can list the currently available packages using the NBI-slurm utility:

source package nbi-slurm
shelf humann

If you don't find the specific version of the tool you want to use, you can install Humann for yourself using the following instructions

Databases

The reference databases are downloaded and shared for everyone to use by the core bioinformatics:

MPA
- /qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212
HUMANN not
- /qib/platforms/Informatics/databases/humann_db/2023/chocophlan
HUMANN prot
- /qib/platforms/Informatics/databases/humann_db/uniref

If you can't find your database of interest, don't hesitate to contact us, and we'll download it for you!

Basic Usage

Humann can be run after QC and human read removal (see this tutorial) on your fastq files as follows:

source package e59dcdcb-efe4-4b6c-90fc-f35899b7e1a2 # Humann3.8
MPA="/qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212"
HUMANN_NUC="/qib/platforms/Informatics/databases/humann_db/2023/chocophlan"
HUMANN_PROT="/qib/platforms/Informatics/databases/humann_db/uniref"

humann --input ${YOURFILE.fastq} --output ${YOUROUTDIR} --metaphlan-options "--offline --bowtie2db $MPA" --nucleotide-database $HUMANN_NUC --protein-database $HUMANN_PROT

When HUMAnN is run from any input type, three main output files will be created:

$SAMPLE_genefamilies.tsv : contains the stratified output for gene family counts
$SAMPLE_pathabundance.tsv : contains the stratified output for pathway family counts
$SAMPLE_pathcoverage.tsv : pathway coverage file

Normalization scripts

Humann provides a utility script to normalize the counts to relative abundance or "copies per million" (CPM) units, that can be run on the genefamilies or the pathabundance output file:

humann_renorm_table --input ${HUMANN_TABLE} --output cpm_${HUMANN_TABLE} --units cpm

Merging Humann3 outputs:

You can merge the per-sample gene family abundance/pathway abundance outputs into a single table using the script humann_join_tables:

humann_join_tables -i ${HUMANN_DIR} -o ${OUTPUT_DIR} --file_name genefamilies

Analysing Humann output

An example of analysing the output of Humann3 using R :

PINGU Dataset Analysis