MetaPhlAn2 - ACHG2018/metagenomics-classification-tools GitHub Wiki

Installation

hg clone https://bitbucket.org/biobakery/metaphlan2

The code does not need compiling process. It can be executed by running the following commane:

cd metaphlan2

To initialize database downloading:

./metaphlan2.py --input_type fastq

The script will automatically start downloading and building mpa_v20_m200 database.

Run Example squence on metaphlan2

The sequence was downloaded by the following command:

cd metaphlan2

mkdir TestData

cd TestData

curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014476-Supragingival_plaque.fasta.gz

curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014494-Posterior_fornix.fasta.gz

curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014459-Stool.fasta.gz

gzip -d S014476-Supragingival_plaque.fasta.gz

To perform a test run, run the following command:

../metaphlan2.py SRS0014476-Supragingival_plaque.fasta --input_type fasta > plaque_profile.txt

Result and Evaluation

The result is shown below

k__Bacteria	100.0
k__Bacteria|p__Firmicutes	100.0
k__Bacteria|p__Firmicutes|c__Clostridia	100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales	100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae	100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium	100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_siraeum	100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_siraeum|t__Eubacterium_siraeum_unclassified	100.0

Where the profile of the test sequence was aligned with database. The structure of the database was followed by basic taxonomy structure:

Kingdom -> Phylum -> Class -> Order -> Family -> Genus -> Species

In the official database, each layer of the taxonomy has its own sequence for reference. To construct a customized database, we will need sequence information on each level of taxonomy. At this point, this might not be a cost-efficient task; therefore, this tool does not fit our need.

The program also have a lot of utilities to visualize data using heatmap and taxonomy mapping. However, we do not need such extensive visualization for our project