Functional Profiling - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

6.2.7 Functional Profiling

Once you have taxonomic profiles or assembled contigs, the next step is to characterize the functional potential of your community—i.e. what genes and pathways are present and at what abundance. We’ll cover three popular approaches:


A. HUMAnN3 (Pathway Abundance)

Purpose: quantify gene families and metabolic pathways directly from raw reads or assemblies.

Installation

conda activate metagenomics
conda install -c bioconda humann

Workflow

# 1) Run HUMAnN3 on paired FASTQ (Illumina)
humann \
  --input trimmed/SampleA_R1.trimmed.fastq.gz,trimmed/SampleA_R2.trimmed.fastq.gz \
  --output humann_out/SampleA \
  --threads 8

# 2) Join tables across samples
humann_join_tables \
  --input humann_out/  \
  --file_name pathabundance  \
  --output humann_out/pathabundance.tsv

humann_join_tables \
  --input humann_out/  \
  --file_name genefamilies  \
  --output humann_out/genefamilies.tsv

# 3) Normalize to relative abundance
humann_renorm_table \
  --input humann_out/pathabundance.tsv \
  --output humann_out/pathabundance_relab.tsv \
  --units relab

humann_renorm_table \
  --input humann_out/genefamilies.tsv \
  --output humann_out/genefamilies_relab.tsv \
  --units relab

####Outputs

  • genefamilies.tsv – counts for UniRef90 gene families

  • pathabundance.tsv – counts for MetaCyc pathways

  • *_relab.tsv – tables normalized to relative abundance (%)

B. eggNOG-mapper (Orthology Annotation)

Purpose: annotate predicted proteins (from contigs or MAGs) with orthologous groups, COG categories, GO terms and KEGG pathways.

Installation

conda install -c bioconda eggnog-mapper prodigal

Workflow

# 1) Predict ORFs on your contigs
prodigal \
  -i assembly/megahit_out/final.contigs.fa \
  -a eggnog_out/proteins.faa \
  -d eggnog_out/genes.fna \
  -p meta

# 2) Run eggNOG-mapper
emapper.py \
  -i eggnog_out/proteins.faa \
  --output eggnog_out/annot \
  --cpu 8 \
  --itype proteins

# 3) Parse annotations
#   - eggnog_out/annot.emapper.annotations: tab‐delimited annotation table

Key fields in annot.emapper.annotations

  • proteg: protein ID

  • eggNOG OGs: assigned orthologous groups

  • COG category: broad functional classes

  • KEGG KO: pathway identifiers

  • GO terms: functional ontology

C. MG-RAST Overview

Purpose: public web service for automated metagenome annotation and comparative analysis.

  1. Register at https://www.mg-rast.org/

  2. Upload FASTQ or assembled contigs via web or FTP

  3. Pipeline automatically performs:

  • Quality control & de‐duplication

  • Gene prediction (FragGeneScan)

  • Functional annotation against M5NR, KEGG, SEED, COG

  • Taxonomic profiling (M5NR)

  1. Retrieve results via web interface or REST API, including:
  • abundance.tsv (functional categories)

  • otu_table.tsv (taxonomic counts)

  • Interactive heatmaps and Krona charts

# Example: fetch functional abundance via API
curl "https://api.mg-rast.org/metagenome/PROJECT_ID/function?source=KO" \
     -o mg_rast_KO_abundance.tsv

Tip: MG-RAST is ideal for users who prefer a managed GUI and integrated comparative tools, but yields less control than local pipelines.