Week 5: DE Protocol - bcb420-2025/Izumi_Ando GitHub Wiki

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

a detailed step-by-step protocol for running differential gene expression analysis using edgeR and DESeq, aimed at helping users choose between them and apply them properly

Citation

Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W., & Robinson, M. D. (2013). Differential expression analysis for sequence count data. Nature Protocols, 8(9), 1765–1786. https://doi.org/10.1038/nprot.2013.099

Notes

intro + goals

  • protocol builds on earlier edgeR and DESeq papers but walks through full workflow for practical use
  • meant for biologists with limited stats experience who want to do RNA-seq DE analysis in R
  • both tools model count data with negative binomial distribution, but differ in how they estimate dispersion and shrinkage

normalization + filtering

  • shows how to import count data into R and filter out low-count genes
  • both methods normalize for library size:
    • edgeR uses TMM
    • DESeq uses a median-of-ratios method
  • emphasizes that proper normalization is critical before any modeling

dispersion estimation

  • dispersion reflects biological variability and is estimated differently by the two tools
    • edgeR uses empirical Bayes to shrink dispersions toward a common trend
    • DESeq fits a dispersion-mean relationship and smooths individual estimates
  • choice of method can impact sensitivity and FDR depending on sample size

differential testing

  • edgeR supports exact tests and GLM-based designs (good for complex experiments)
  • DESeq uses a test based on the NB distribution with optional shrinkage of fold changes
  • both packages control for multiple testing using Benjamini-Hochberg FDR

practical guidance

  • includes actual R code examples for each step (loading data, fitting models, extracting DE genes)
  • recommends visual checks (MA plots, p-value histograms) before trusting results
  • suggests exporting results for downstream analysis like GO/pathway enrichment

comments

  • paper's strength is in clarity—walks through a complete real-world pipeline
  • useful if you’re trying to decide between edgeR and DESeq or want a reproducible script-based workflow
  • also highlights some pitfalls like poor normalization or ignoring low-count filtering