Table of Contents Paper 6: "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor" Introduction Development of the Protocol Procedure Steps Software Implementation Reproducible Research Equipment Setup Conclusion edgeR 4.0 Overview Abstract Introduction Key Developments New Functionalities Conclusion References

Paper 6: "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor"

Introduction

RNA-seq platform addresses multiple applications including expression analysis, alternative splicing, novel transcript discovery, RNA editing, and non-model organism transcriptomes.
Initial analysis goal: identify genes with expression level changes between conditions using tools like DESeq and edgeR.

Development of the Protocol

Sequence steps: from reading sequences, through feature counting, to differential expression discovery.
Emphasis on quality checks throughout the process.
Statistical methods used operate on a feature count table, with further quality checks before statistical modeling.

Procedure Steps

Assess Sequence Quality Control

- Use ShortRead package to evaluate sequence quality.
- Generate quality assessment report in HTML format for review.

Collect Metadata of Experimental Design

- Create a metadata table named samples. This table includes sample identifiers, experimental conditions, blocking factors, and file names.

Map Reads to Reference Genome

- Use tophat2 for mapping reads to the reference genome. Include annotation via a GTF file to assist in mapping across exon-exon junctions.

Organize, Sort and Index BAM Files

- Sort and index BAM files using samtools. Prepare files for downstream tools like htseq-count.

Count Reads Using htseq-count

- Integrate read counts into the metadata table. Use htseq-count for assigning reads to genes based on alignment and annotation data.

Software Implementation

Analysis largely conducted within R and Bioconductor for ease of maintenance, training, and portability.
Discusses the integration of Unix commands within the R environment to streamline processes.

Reproducible Research

Emphasizes the importance of recording all commands and software versions used in the analysis to ensure reproducibility.
Recommends using tools like Sweave or knitR for creating executable documents that combine code and narrative.

Equipment Setup

Details on setting up necessary software and downloading example data.
Provides specific instructions for installing necessary tools and preparing the computational environment.

Conclusion

The protocol provides a foundation for RNA-seq data analysis, emphasizing reproducibility and the adaptability of the workflow to specific project needs.

edgeR 4.0 Overview

Title: edgeR 4.0: Powerful Differential Analysis of Sequencing Data with Expanded Functionality and Improved Support for Small Counts and Larger Datasets.
Authors: Yunshun Chen, Lizhong Chen, Aaron T. L. Lun, Pedro L. Baldoni, Gordon K. Smyth.
Affiliations: Includes the Bioinformatics Division at WEHI, Parkville, VIC, Australia, and Computational Sciences at Genentech Inc., USA.
Correspondence: Gordon K. Smyth, email: [email protected].

Abstract

edgeR is an R/Bioconductor software package designed for differential analysis of sequencing data using read counts.
Over 15 years of use, edgeR has evolved significantly, now using the negative binomial distribution and generalized linear models for complex experimental designs.
The new version, edgeR 4.0, introduces infrastructure improvements like support for fractional counts, C++ model fitting, and new functionalities for a variety of analyses including methylation, transcript expression, and more.

Introduction

NGS technologies like RNA-seq and ChIP-seq have revolutionized biomedical research, with edgeR providing robust analytical methods.
edgeR 4.0 adapts to current technological advancements and user feedback, improving functionality especially for complex data types and large datasets.

Key Developments

Support for Fractional Counts: Allows for more precise handling of data, avoiding the need to round fractional counts.
Model Fitting in C++: Enhances computational efficiency, particularly beneficial for large datasets.
Statistical Enhancements: Improved accuracy in the quasi-likelihood pipeline for small counts and integration of empirical Bayes moderation methods.

New Functionalities

Differential Methylation Analysis: Expanded to include analysis of differential methylation patterns.
Transcript and Exon Usage: New tools for analyzing differential usage at the transcript and exon levels.
Pathway Analysis: Incorporation of tools to examine gene sets and pathways affected by experimental conditions.

Conclusion

The updates in edgeR 4.0 address both foundational statistical methods and expand the scope of applicable analyses.
The enhancements in computational infrastructure ensure edgeR remains a top choice for researchers needing detailed and accurate analysis of genomic data.

References

Anders, S., McCarthy, D.J., Chen, Y., Okoniewski, M., Smyth, G.K., Huber, W., & Robinson, M.D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature Protocols, 8(9), 1765-1786. DOI: 10.1038/nprot.2013.099.
Chen, Y., Chen, L., Lun, A. T. L., Baldoni, P. L., & Smyth, G. K. (2024). edgeR 4.0: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. bioRxiv. https://doi.org/10.1101/2024.01.21.576131

Paper 6 & 7: DESeq & EdgeR - bcb420-2025/Keren_Zhang GitHub Wiki

Table of Contents

Paper 6: "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor"

Introduction

Development of the Protocol

Procedure Steps

Software Implementation

Reproducible Research

Equipment Setup

Conclusion

edgeR 4.0 Overview

Abstract

Introduction

Key Developments

New Functionalities

Conclusion

References

⚠️ GitHub.com Fallback ⚠️

Paper 6 & 7: DESeq & EdgeR - bcb420-2025/Keren_Zhang GitHub Wiki

Table of Contents

Paper 6: "Count-based differential expression analysis of RNA sequencing data using R and Bioconductor"

Introduction

Development of the Protocol

Procedure Steps

Software Implementation

Reproducible Research

Equipment Setup

Conclusion

edgeR 4.0 Overview

Abstract

Introduction

Key Developments

New Functionalities

Conclusion

References

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️