MetaPhlAn - quadram-institute-bioscience/biobakery-2024 GitHub Wiki

What is MetaPhlAn

MetaPhlAn is a computational tool designed for microbial profiling from metagenomic shotgun sequencing data. It can identify and quantify bacteria, archaea, eukaryotes, and viruses at the species level. The latest version, MetaPhlAn 4.1, utilizes a large database of approximately 7.3 million unique clade-specific marker genes derived from over 1.1 million microbial genomes, including both isolate genomes and metagenome-assembled genomes.

Running MetaPhlAn4

Complete documentation for MetaphlAn4 can be found in the Biokakery wiki

Pre-requisites

MetaphlAn4 installation

MetaphlAn4 is installed on the QIB HPC system in several version. You can list the currently available packages using the NBI-slurm utility:

source package nbi-slurm
shelf metaphlan

If you don't find the specific version of the tool you want to use, you can install MetaphlAn for yourself using the following instructions

Databases

The reference databases are downloaded and shared for everyone to use by the core bioinformatics:

DB_DIR : "/qib/platforms/Informatics/databases/metaphlan4"

- mpa_vJan21_CHOCOPhlAnSGB_202103
- mpa_vOct22_CHOCOPhlAnSGB_202403
- mpa_vJun23_CHOCOPhlAnSGB_202307

If you can't find your database of interest, don't hesitate to contact us, and we'll download it for you!

Basic Usage

MetaphlAn can be run after QC and human read removal (see this tutorial) on your fastq files as follows:

source package metaphlan__4.1.1
MPA_DB="/qib/platforms/Informatics/transfer/outgoing/databases/humann_db/mpa/mpa_vOct22_CHOCOPhlAnSGB_202212/mpa_vOct22_CHOCOPhlAnSGB_202212"
metaphlan ${YOUR_FILE.fastq} --input_type fastq -o ${YOUR_OUTPUTFILE.txt} --bowtie2db ${MPA_DB} --offline

It is highly recommended to estimate the unclassified fraction of the metagenome. The relative abundance profile is scaled according to the percentage of reads mapping to a clade in the database.

metaphlan ${YOUR_FILE.fastq} --input_type fastq -o ${YOUR_OUTPUTFILE.txt} --bowtie2db ${MPA_DB} --offline --unclassified_estimation 

Merging files

The script merge_metaphlan_tables.py allows to combine MetaPhlAn output from several samples to be merged into one table :

merge_metaphlan_tables.py metaphlan_output1.txt metaphlan_output2.txt > merged_abundance_table.txt

Converting SGB profiles to the GTDB taxonomy

The script sgb_to_gtdb_profile.py allows to convert a SGB-based MetaPhlAn output into a GTDB-taxonomy-based profile.

sgb_to_gtdb_profile.py -i metaphlan_output.txt -o metaphlan_output_gtdb.txt

Analysing MetaPhlAn4 output

An example of analysing the output of Metaphlan4 using R and the miaverse: