CBW 2021 Metagenomic Taxonomic and Functional Composition Tutorial Answers - LangilleLab/microbiome_helper GitHub Wiki
These are the answers for the Metagenomic Taxonomic and Functional Composition Tutorial created for the 2021 microbiome data analysis Canadian Bioinformatics Workshop.
-
There should be the same number of reads in the reverse FASTQ for each sample. So there should be 100000/4 = 25000 for sample CSM79HR8 and 100400/4 = 25100 for sample HSM7J4QT.
-
Only 119 reads were removed due to matching the human and/or PhiX genomes, which again highlights that this data has already been stringently filtered.
-
Researchers have different preferences and opinions about how raw data should be pre-processed. It's important to upload all the raw data so that your work can be fully reproducible and so different pipelines could be used.
-
The text
{/.}.kraken.txt
and{/.}.kreport
is what is used to indicate the name of the output files from our kraken2 command. Remember that the text{}
is replaced by the input file read by the argument at the end of our command::: cat_reads/*.fastq
. By including a/.
within{}
it means we want to remove the full address of the input file and only keep the file name rather than its whole PATH. -
We could figure out the total number of taxa that were identified in all of our samples using the
wc -l
command and subtracting one.wc -l bracken_out_merged/merged_output.species.bracken
. This results in a total of 248 taxa being identified. -
The sequence
HKWJVBCXY170606:2:2116:8029:10262/1
aligned with 25 protein sequences (the maximum number we allowed in our mmseqs command). There are four protein sequences that share the highest bitscore/lowest E-value with this sequence.UniRef90_A7LXV1
UniRef90_A7LXV1
UniRef90_A0A174F8K1
UniRef90_A0A1F0I3S4
. -
The RPKM of EC 2.1.2.9 (Methionyl-tRNA formyltransferase) contributed by Bacteroides vulgatus is 985.117.
-
The enzyme with EC number 6.1.1.4 is named Leucine--tRNA ligase.
-
We can get the total number of pathways identified by examining the total number of entries into the unstratified table using this command:
zcat pathways_out/path_abun_unstrat.tsv.gz | wc -l
Remember that we need to remove 1 (for the header line). Therefore in total there are 4 pathways identified. -
Inspecting the stratified pathway abundances using
less pathways_out/path_abun_strat.tsv.gz
. We will see that there are two taxa that contribute the thePANTO-PWY
pathway.