Metagenomic analysis of the Ethiopian cohort - biobakery/biobakery GitHub Wiki

Note: This page is running a reduced-size version of the MAGs and the corresponding database to run within a normal tutorial window. To see the full version of the commands please see the main PhyloPhlAn tutorial specifically


This tutorial will show you how to phylogenetically characterize newly assembled genomes from metagenomes in the context of Species-level Genome Bins (SGBs).

To do this we use 50 metagenomes of the Ethiopian cohort: From the 50 Ethiopian metagenomes, 369 MAGs were reconstructed (with at least >50% completeness and <5% contamination, based on checkM)

Note: Before starting, make sure to have PhyloPhlAn 3 installed.

1. Setup for PhyloPhlAn metagenomic run

Ingredients you will need to run PhyloPhlAn metagenomic include:

  1. A directory with contigs (genome bins / MAGs) from your metagenomic study (for a tutorial on the basics of assembly see here)
  2. A database of SGBs to pull annotations from

Ingredients you will need to run PhyloPhlAn include:

  1. Reference genomes
  2. Genome bins / MAGs assigned to each phylogeny of interest
  3. Database with annotated marker genes (see Database setup)
  4. Configuration file (How to make a configuration file)

1.1 Download the Ethiopian MAGs setup files

If not on a VM, pull the script to do this from GitHub:

wget https://github.com/biobakery/biobakery/releases/download/1.8/setup.sh

Let's run the setup.sh to set up our environment to run PhyloPhlAn on metagenomic samples.

sh setup.sh

2. Running PhyloPhlAn metagenomic

2.1 Assign a taxonomic label to each bin

With the following command, we will use the SGB release of January 2020 to assign to each genome bin its closest SGB.

Reminder this is a reduced sized database - if you are trying to run against the full database please use the lastest full-size edition located here.

phylophlan_assign_sgbs \
    -i tutorial_ethiopia/ethiopian_mags \
    -o tutorial_ethiopia/ethiopian_mags \
    --nproc 4 \
    -n 1 \
    -d ethiopia_tutorial \
    --database_folder ethiopia_tutorial_db \
    --verbose 2>&1 | tee logs/phylophlan_metagenomic.log

In this case, for each genome bin, we are interested in only the closest SGB (-n 1), which is reported in the output. If the genome bin has a Mash distance <2% from the reported SGB, we can consider that bin as part of it and transfer the SGB's taxonomic label.

What does the output of this code?

less -S tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv

2.2 Heatmaps of the top 21 SGBs found in the Ethiopian metagenomes

This step allows you to visualize the top 21 SGBs found in the Ethiopian metagenomes.

To be able to do this, you need to provide a mapping file that maps each genome bin to the metagenome it was assembled from. The mapping file should be a tab-separated text file where the genome bins / MAGs are listed in the first column and the corresponding metagenome in the second column.

For this example, we are providing the mapping file tutorial_ethiopia__mag2meta.tsv present inside the example folder. To further visualize this file run column -t -s "," tutorial_ethiopia/tutorial_ethiopia__mag2meta.tsv | less -S then q to escape.

phylophlan_draw_metagenomic \
-i tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv \
--map tutorial_ethiopia/tutorial_ethiopia__mag2meta.tsv \
-f png \
--verbose 2>&1 | tee phylophlan_draw_metagenomic.log

This will produce two heatmaps:

  1. The first heatmap shows, for each metagenome, the presence/absence profile of the top 21 SGBs found in the Ethiopian cohort
  2. The second heatmap shows how many uSGBs, kSGBs, and unassigned bins / MAGs are present in each metagenome

PhyloPhlAn 3: Example 03: Metagenomic application: presence / absence heatmap

PhyloPhlAn 3: Example 03: Metagenomic application: counter uSGBs, kSGBs, unassinged heatmap

Where do we go from here?

The SGBs profiles of the Ethiopian cohort can be further analyzed focusing on some specific known and/or unknown SGBs.

For instance, if we focus on the common gut commensal Escherichia coli, we can put into phylogenetic context the 8 Ethiopian MAGs falling into kSGB 10068, as shown in 4. High-resolution phylogeny of genomes and MAGs of a known species (E. coli) or a reduced version below.

3. PhyloPhlAn to phylogenetically place MAGs

3.1 E. coli in the Ethiopian MAGs

We won't run it because of time, but PhyloPhlAn example 4 gives an example of building a genome tree from Escherichia coli genomes using PhyloPhlAn. You'll get the following outputs:

example_output/ecoli.tre
example_output/ecoli_concatenated.aln

You can visualize the tree using the ggtree script (the same way that we did in StrainPhlAn).

cd example_output
./phylophlan_ggtree.R ecoli.tre ecoli_concatenated.aln ecoli_tree1.png ecoli_tree2.png