CBW 2021 PICRUSt2 Tutorial Answers - merckey/microbiome_helper GitHub Wiki

Answers for the CBW 2021 PICRUSt2 tutorial presented as part of the 2021 Canadian Bioinformatics Workshop on microbiome data analysis.

  1. There are 36 samples. You could either count the rows of metadata table or type: wc -l input_files/picrust2_lab_metadata.tsv (which would include the header-line). Alternatively you could get the number of columns in the sequence abundance table with this command:

    awk '{print NF}' input_files/ASV_abun.tsv | head -n 1

  2. You can count how many sequences there are in a FASTA file by counting how many header-lines (i.e. lines that begin with ">") there are:

    grep -c ">" input_files/ASVs.fna
    
  3. The lowest NSTI value is 0.00010, which you can figure out with this R command: min(hsp_16S_nsti$metadata_NSTI).

  4. The genome represented by sequence 3bc9d66614c8c98d398ace7483422449 is predicted to have 6 copies of the 16S rRNA gene.

  5. The normalized abundance is 3.67, which means that there must have been 3 predicted marker genes. (11/3 = 3.67)

  6. The column sums wouldn't typically be equal since there are different numbers of gene families for each predicted genome. Remember that each predicted genome is based on each input ASV, which will be at variable relative abundances across your samples!

  7. Statement #2 is correct: "The stratified pathway abundances represent the abundances of the community-wide pathway levels contributed by an individual predicted genome." This is important to remember - due to how PICRUSt2 outputs the stratified pathway abundances you can't know whether all the genes necessary for expressing a pathway are present in an individual predicted genome. The pathway inference is done at the community level.

  8. There are 6 possible placement positions in the tree for ASV d3d5bc15a5f947217d626ad4a99c5757.

  9. Since these are human-associated microbial communities they have been well characterized (the majority of reference genomes are from human-associated species).