07 Read counting - saltpinna/Genome_analysis_project GitHub Wiki

Read counting was done using htseq. The scripts used can be found under code/scripts/htseq_count_BH.sh and code/scripts/htseq_count_serum.sh. The results were then plotted together using the script under code/scripts/Plot_htseq.r. The resulting plot is presented below. Since each condition had three samples, the data is plotted in triplicated with each data point corresponding to three dots.

Questions

What is the distribution of the counts per gene? Are most genes expressed? How many counts would indicate that a gene is expressed?

As can be seen in the plot above, the distribution of counts per genes varies a lot, between 0 and 350 000. It is difficult to say whether all genes are expressed based on this plot and to draw the line where a gene is expressed or not. A read might be incorrectly mapped, indicating that the gene is indeed expressed while it is not actually. We might also miss genes that are actually expressed, if there is problems with the seqeuncing of the RNA. But we can make the conclusion that a lot of the genes are expressed.