Questions - ParkinsonLab/microbiome_helper GitHub Wiki
Name: ________________________ Student Number: ________________________________
##16S Tutorial
##Background
16S analysis is a method of microbiome analysis (compared to shotgun metagenomics) that targets the 16S ribosomal RNA gene, as this gene is present in all prokaryotes. It features regions that are conserved among these organisms, as well as variable regions that allow distinction among organisms. These characteristics make this gene useful for analyzing microbial communities at reduced cost compared to metagenomic techniques. A similar workflow can be applied to eukaryotic micro-organisms using the 18S rRNA gene.
This dataset was originally used in a project to determine whether knocking out the protein chemerin affects gut microbial composition. Originally 116 mouse samples acquired from two different facilities were used for this project (only 24 samples were used in this tutorial dataset, for simplicity). Metadata associated with each sample is indicated in the mapping file (map.txt). In this mapping file the genotypes of interest can be seen: wildtype (WT), chemerin knockout (chemerin_KO), chemerin receptor knockout (CMKLR1_KO) and a heterozygote for the receptor knockout (HET). Also of importance are the two source facilities: "BZ" and "CJS". It is generally a good idea to include as much metadata as possible, since this data can easily be explored later on.
To get started, please go to: https://github.com/ParkinsonLab/microbiome_helper/wiki
##Questions
###Browsing Data Question 1: Based on the genotype column of map.txt, how many samples are Wild Type, and how many are KOs?
Question 2: How many reads are in each fastq file in the fastq folder? Note that the raw reads are in FASTQ format.
Question 3: Why are there two sequence files per sample in the fastq folder?
Stitching paired-end reads
Question 4: What percent of reads were successfully stitched for sample 75CMK8KO?
Filtering reads by quality and length
Question 5: How many of sample 75CMK8KO's reads were filtered out for not containing a match to the forward primer (which is the default setting in this case)?
Conversion to FASTA and removal of chimeric reads
Question 6: Based on the output in "chimera_filter_log.txt", what is the mean number of chimera calls per sample? What is the mean percent of reads retained after this step ("nonChimeraCallsPercent" column)?
Question 7: What percent of stitched reads was retained for sample 75CMK8KO after all the filtering steps (HINT: you'll need to compare the original number of reads to the number of reads output by chimera_filter.pl)?
Run open-reference OTU picking pipeline
Rarify reads
Question 8: What is the read depth for sample "75CMK8KO"?
Using STAMP to test for particular differences
Question 9: Based on the barplots in STAMP, which sample has the highest proportion of Oscillospira sequences?