HumaNn - quadram-institute-bioscience/biobakery-2024 GitHub Wiki
What is Humann
Humann is a computational profiler allowing users to estimate the abundance of microbial metabolic pathways and gene families from metagenomic or metatranscriptomic sequencing data.
Running Humann3
Complete documentation for Humann3 can be found in the Biokakery wiki
Pre-requisites
Humann3 installation
Humann3 is installed on the QIB HPC system in several version. You can list the currently available packages using the NBI-slurm utility:
source package nbi-slurm
shelf humann
If you don't find the specific version of the tool you want to use, you can install Humann for yourself using the following instructions
Databases
The reference databases are downloaded and shared for everyone to use by the core bioinformatics:
- MPA
- /qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212
- HUMANN not
- /qib/platforms/Informatics/databases/humann_db/2023/chocophlan
- HUMANN prot
- /qib/platforms/Informatics/databases/humann_db/uniref
If you can't find your database of interest, don't hesitate to contact us, and we'll download it for you!
Basic Usage
Humann can be run after QC and human read removal (see this tutorial) on your fastq files as follows:
source package e59dcdcb-efe4-4b6c-90fc-f35899b7e1a2 # Humann3.8
MPA="/qib/platforms/Informatics/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212"
HUMANN_NUC="/qib/platforms/Informatics/databases/humann_db/2023/chocophlan"
HUMANN_PROT="/qib/platforms/Informatics/databases/humann_db/uniref"
humann --input ${YOURFILE.fastq} --output ${YOUROUTDIR} --metaphlan-options "--offline --bowtie2db $MPA" --nucleotide-database $HUMANN_NUC --protein-database $HUMANN_PROT
When HUMAnN is run from any input type, three main output files will be created:
- $SAMPLE_genefamilies.tsv : contains the stratified output for gene family counts
- $SAMPLE_pathabundance.tsv : contains the stratified output for pathway family counts
- $SAMPLE_pathcoverage.tsv : pathway coverage file
Normalization scripts
Humann provides a utility script to normalize the counts to relative abundance or "copies per million" (CPM) units, that can be run on the genefamilies or the pathabundance output file:
humann_renorm_table --input ${HUMANN_TABLE} --output cpm_${HUMANN_TABLE} --units cpm
Merging Humann3 outputs:
You can merge the per-sample gene family abundance/pathway abundance outputs into a single table using the script humann_join_tables:
humann_join_tables -i ${HUMANN_DIR} -o ${OUTPUT_DIR} --file_name genefamilies
Analysing Humann output
An example of analysing the output of Humann3 using R :