MetaPhlAn2 - ACHG2018/metagenomics-classification-tools GitHub Wiki
Installation
hg clone https://bitbucket.org/biobakery/metaphlan2
The code does not need compiling process. It can be executed by running the following commane:
cd metaphlan2
To initialize database downloading:
./metaphlan2.py --input_type fastq
The script will automatically start downloading and building mpa_v20_m200 database.
Run Example squence on metaphlan2
The sequence was downloaded by the following command:
cd metaphlan2
mkdir TestData
cd TestData
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014476-Supragingival_plaque.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014494-Posterior_fornix.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014459-Stool.fasta.gz
gzip -d S014476-Supragingival_plaque.fasta.gz
To perform a test run, run the following command:
../metaphlan2.py SRS0014476-Supragingival_plaque.fasta --input_type fasta > plaque_profile.txt
Result and Evaluation
The result is shown below
k__Bacteria 100.0
k__Bacteria|p__Firmicutes 100.0
k__Bacteria|p__Firmicutes|c__Clostridia 100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales 100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae 100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium 100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_siraeum 100.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_siraeum|t__Eubacterium_siraeum_unclassified 100.0
Where the profile of the test sequence was aligned with database. The structure of the database was followed by basic taxonomy structure:
Kingdom -> Phylum -> Class -> Order -> Family -> Genus -> Species
In the official database, each layer of the taxonomy has its own sequence for reference. To construct a customized database, we will need sequence information on each level of taxonomy. At this point, this might not be a cost-efficient task; therefore, this tool does not fit our need.
The program also have a lot of utilities to visualize data using heatmap and taxonomy mapping. However, we do not need such extensive visualization for our project