Week 5: DE Protocol - bcb420-2025/Izumi_Ando GitHub Wiki
Count-based differential expression analysis of RNA sequencing data using R and Bioconductor
a detailed step-by-step protocol for running differential gene expression analysis using edgeR and DESeq, aimed at helping users choose between them and apply them properly
Citation
Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W., & Robinson, M. D. (2013). Differential expression analysis for sequence count data. Nature Protocols, 8(9), 1765–1786. https://doi.org/10.1038/nprot.2013.099
Notes
intro + goals
- protocol builds on earlier edgeR and DESeq papers but walks through full workflow for practical use
- meant for biologists with limited stats experience who want to do RNA-seq DE analysis in R
- both tools model count data with negative binomial distribution, but differ in how they estimate dispersion and shrinkage
normalization + filtering
- shows how to import count data into R and filter out low-count genes
- both methods normalize for library size:
- edgeR uses TMM
- DESeq uses a median-of-ratios method
- emphasizes that proper normalization is critical before any modeling
dispersion estimation
- dispersion reflects biological variability and is estimated differently by the two tools
- edgeR uses empirical Bayes to shrink dispersions toward a common trend
- DESeq fits a dispersion-mean relationship and smooths individual estimates
- choice of method can impact sensitivity and FDR depending on sample size
differential testing
- edgeR supports exact tests and GLM-based designs (good for complex experiments)
- DESeq uses a test based on the NB distribution with optional shrinkage of fold changes
- both packages control for multiple testing using Benjamini-Hochberg FDR
practical guidance
- includes actual R code examples for each step (loading data, fitting models, extracting DE genes)
- recommends visual checks (MA plots, p-value histograms) before trusting results
- suggests exporting results for downstream analysis like GO/pathway enrichment
comments
- paper's strength is in clarity—walks through a complete real-world pipeline
- useful if you’re trying to decide between edgeR and DESeq or want a reproducible script-based workflow
- also highlights some pitfalls like poor normalization or ignoring low-count filtering