Differential Gene Expression - bcb420-2025/Clare_Gillis GitHub Wiki

TODO:

  • Calculate p-values for each of the genes in your expression set. How many genes were significantly differentially expressed? What thresholds did you use and why?
  • Multiple hypothesis testing - correct your p-values using a multiple hypothesis correction method. Which method did you use? And Why? How many genes passed correction?
  • Show the amount of differentially expressed genes using an MA Plot or a Volcano plot. Highlight genes of interest.
  • Visualize your top hits using a heatmap. Do you conditions cluster together? Explain why or why not.
  • Make sure all your figures have proper heading and labels. Every figure included in the report should have a detailed figure legend

1.1 - Starting assignment

Note: Quasi Likelihood model is highly recommended for bulk RNAseq data (like ours)

doing this step d <- estimateDisp(d, model_design_pat) I keep getting Design matrix not of full rank - what does this mean? - ok it looks like that means the variables I'm using for the model are redundant - patient and diagnosis are directly correlated. So, I'll do diagnosis and layer (should group by layer but correct for layer differences)

I did the initial DGE analysis and got 8444 genes before correction and 4375 after FDR. This makes sense since pretty much all genes on chromosome 21 should be included. I'm interested in the ones NOT on chromosome 21.

FDR = false discovery rate, topTags automatically uses BH but I've specified this as well.

Volcano plot is a plot of log p val against log fold change that shows significant changes in a dataset. (According to Wikipedia)

Should I make the volcano plot with raw or BH corrected P-values? I made both and they look identical except for a difference in scale.

1.2 Moving on to heat mapping

Use ComplexHeatmapping

That wasn't too bad. Looks like my genes cluster quite well by diagnosis, and a bit by layer. I'll need to readjust the legend and scale though. Also I chose to do 5th and 95th percentile for colours (instead of max and min) because there seemed to be quite a lot of outliers.