Assignment - GeertsManon/EEG_Metagenomics GitHub Wiki

Context

In the previous practical session, you worked with a single cave metagenomic sample (DRC_cave_A) and manually assembled 5 high-quality bacterial genomes. Now it's time to expand your analysis!

Instead of analyzing just one cave location, you'll now compare three different sampling sites from the same cave system in the Democratic Republic of Congo:

  • DRC_cave_A (your original sample)
  • DRC_cave_B
  • DRC_cave_C

Interactive interface

To save time, everything has been done for you in terms of coding. All you need to do is go to the correct directory and launch the interactive interface:

cd $VSC_DATA/EEG_metagenomics/assignment

anvi-interactive -p MERGED_PROFILE/PROFILE.db -c contigs.db --port XXXXX

Make sure to fill in the X's with your personal VSC username.

Then, open your web browser and navigate to the URL (http://localhost:XXXXX) shown on your screen.

If you need to create a new nested SSH tunnel, please follow the exact steps as described here

Settings

Instead of one coverage layer, you'll see three coverage layers - one for each cave location (A, B, C). This allows you to directly compare the abundance of each organism across the three sites.

Go to the Main tab and adjust the following settings (don't forget to click Draw):

This displays a proportional barplot of the Ribosomal S6 gene. Why Ribosomal S6? During our anvi-estimate-scg-taxonomy step, Anvi'o identified this as the most frequently detected SCG across all contigs, making it the best single marker for estimating community composition. This thus shows you the relative proportions of each family across your samples.

This visualization helps you understand:

  • Which taxonomic families dominate the cave microbiome
  • Whether the same families are abundant across all three caves
  • How community composition varies spatially within the cave system

Assignment

Question 1

Identify your five previously designated bins in this multi-sample view. Add a screenshot of your bin tab.

Question 2

Expand your previous table:

Bin Name Taxonomy (full lineage) Identified to Completeness (%) Contamination (%) Genome Size (bp) Average coverage of Ribosomal S6 gene in cave A Average coverage of Ribosomal S6 gene in cave B Average coverage of Ribosomal S6 gene in cave C
Bin_1 Bacteria; Pseudomonadota; Gammaproteobacteria; Methylococcales; Methylomonadaceae; Methyloglobulus; sp016874115 species level 98.6 0.0 3,030,873 62.76x ... ...
Bin_2 ... ... ... ... ... ... ... ...
Bin_3 ... ... ... ... ... ... ... ...
Bin_4 ... ... ... ... ... ... ... ...
Bin_5 ... ... ... ... ... ... ... ...

💡 Tip: Check your previously created table BIN_SUMMARY/bins_summary.txt.

💡 Tip: Explore the MERGED_taxonomy_summary.txt, which has been generated for you.

Question 3

Which organisms, represented by high-quality genomes, are found in all three caves? Are there any organisms missing from specific locations? Are there organisms exclusive to certain caves? Answer in no more than three sentences.

Question 4

Considering the Ribosomal S6 family barplot, which includes medium- and low-quality genomes and therefore shows greater diversity compared to only high-quality genomes, which cave exhibits the highest diversity? And which one has the lowest? Answer in no more than three sentences.

Question 5

Based on previous findings, hypothesize how the underground river flows through these caves. Sketch a simple flow diagram (A → B and A → C or C → A and B → A) and explain. Answer in no more than five sentences.