Paper 3: RNA‐Sequencing - bcb420-2025/Keren_Zhang GitHub Wiki
RNA sequencing (RNA-seq) has significantly evolved since its development more than a decade ago. Originally a method for analyzing differential gene expression, RNA-seq now influences nearly every aspect of genomic function understanding.
- Origins: Introduced over a decade ago, RNA-seq was first utilized for differential gene expression (DGE) analysis across various organisms like Zea mays, Arabidopsis thaliana, Saccharomyces cerevisiae, Mus musculus, and Homo sapiens.
- Workflow: The standard workflow has not changed significantly and includes RNA extraction, mRNA enrichment or ribosomal RNA depletion, cDNA synthesis, adaptor-ligated library preparation, and sequencing, typically producing 10-30 million reads per sample.
- Evolution of Methodologies: The technology has seen improvements in long-read RNA-seq and direct RNA sequencing (dRNA-seq) methods, enhancing the ability to analyze RNA biology in a richer and less biased manner compared to older microarray-based methods.
- Short-Read vs Long-Read: Traditionally dominated by short-read technologies from Illumina, newer long-read technologies like those from Pacific Biosciences and Oxford Nanopore allow for better understanding of transcript complexity by enabling full-length mRNA sequencing.
- Beyond DGE: RNA-seq is now used for a variety of applications beyond traditional DGE. These include studying mRNA splicing, the role of non-coding RNAs in gene expression regulation, and other complex aspects of RNA biology.
- Spatial Transcriptomics: New areas such as spatial transcriptomics (spatialomics) are being explored, which integrate the physical location of RNA transcripts within tissue samples, providing a spatial context to transcriptomic data.
- Routine Applications: With ongoing advancements, techniques like single-cell RNA-seq and spatial RNA-seq are expected to become as routine as DGE analysis.
- Replacement of Short-Read Technologies: There is potential for long-read methods to replace short-read techniques in specific niches where their advantages can be fully leveraged.
- Data Complexity: The complexity of data generated by newer RNA-seq technologies demands advanced computational tools and methodologies for effective data analysis and interpretation.
- Methodological Variance: The field continues to grapple with the challenge of methodological variance, particularly in how different RNA-seq approaches handle multi-mapped reads or isoform quantification.
- Differential Gene Expression (DGE)
- Methods used to identify quantitative changes in expression levels between experimental groups.
- Read Depth
- Total number of sequencing reads obtained for a sample, crucial for ensuring sufficient data for reliable analysis.
- Short-Read Sequencing
- Technologies generating reads up to 500 bp, commonly used for fragmented or degraded mRNAs.
- Long-Read Sequencing
- Technologies producing reads over 1,000 bp, capturing full-length or nearly full-length mRNAs, and offering a more complete view of transcript diversity.
- Direct RNA Sequencing (dRNA-seq)
- A method of sequencing RNA directly without the need for reverse transcription, offering insights into RNA modifications and dynamics.
- Multi-mapped Reads
- Reads that could map to multiple locations in the genome or transcriptome, often a challenge in data analysis.
- Synthetic Long Reads
- A technique for creating long reads by assembling shorter reads, used to overcome limitations of short-read sequencing technologies.
- Stark, R., Grzelak, M., & Hadfield, J. (2019). RNA sequencing: the teenage years. Nature Reviews Genetics, DOI: 10.1038/s41576-019-0150-2.
- OpenAI. (2023). ChatGPT (Mar 14 version) [Large]. Available at https://chat.openai.com/chat.