RNA‐seq and enrichment protocols - Integrative-Transcriptomics/tss-prediction-comparison GitHub Wiki

Glossary of important terms

  • RNA-Sample: RNA samples can be obtained from various sources, such as cells, tissues, organs, or body fluids. Once the sample is collected, RNA extraction methods are used to isolate RNA molecules. RNA molecules are converted into a library of cDNA (complementary DNA) fragments suitable for sequencing.
  • Sequencing: The cDNA library is fragmented into smaller pieces to facilitate sequencing. This can be done enzymatically or through physical methods. The fragmented cDNA library is then sequenced using high-throughput sequencing technologies, such as Illumina sequencing or others. During sequencing, fluorescently labeled nucleotides are added to the growing DNA strand, and the emitted light signals are captured to determine the sequence of nucleotides.
  • Mapping: Since RNA-seq generates millions of short reads representing fragments of RNA molecules, mapping in RNA sequencing (RNA-seq) refers to the process of aligning or mapping the short sequence reads obtained from the RNA-seq experiment to a reference genome or transcriptome. This helps determine the origin of the RNA fragment in terms of its genomic location or transcript.
  • Cappable Seq: Cappable-seq is designed to enrich primary transcripts by selectively labeling the 5′ triphosphorylated ends with a biotin tag, enabling their isolation from processed RNA. This method provides single-base resolution identification of transcription start sites (TSS) and significantly reduces ribosomal RNA, simplifying the transcriptome for analysis.
  • dRNA-Seq: Differential RNA-seq (dRNA-seq) involves the selective sequencing of primary transcripts by treating RNA samples with a 5′-phosphate-dependent terminator exonuclease (TEX), which degrades processed RNAs, thereby enriching the primary transcripts. After TEX treatment, primary transcripts are converted to cDNA libraries for sequencing, allowing the annotation of TSS and providing insights into gene expression and regulatory RNA features.
  • Capping: The 5′ triphosphorylated end is specifically selected for enzymatic modification because it is characteristic of primary transcripts synthesized directly by RNA polymerase. The 5′ triphosphorylated end of the RNA is then reached with a selectable tag, in this case a biotinylated GTP. The primary transcripts with an intact 5′ triphosphorylated (or 5′ diphosphate) end are biotinylated and isolated from the processed RNA in vivo.
  • Magnetism: Biotin has a very strong affinity for streptavidin, a protein often immobilized on magnetic beads. The biotinylated transcripts are therefore efficiently bound by adding them to a suspension of streptavidin-coated magnetic beads. This binding enables easy and selective isolation of the labeled transcripts from the entire RNA pool.
  • Enriched Reads / Enriched 5´-PPP: The “enriched reads” are the RNA sequence data that remain after these enrichment steps. These reads mainly represent the primary transcripts with intact 5′ triphosphorylated ends that were directly synthesized by RNA polymerase. Because most of the processed transcripts have been removed, the enriched reads provide a clear picture of the transcription start sites (TSS) and enable accurate mapping of gene expression and regulation.
  • Enzym TEX: The enzyme TEX stands for Terminator 5′-phosphate-dependent exonuclease. It plays a critical role in differentiating between primary and processed RNA transcripts. TEX specifically targets and degrades RNA molecules that have a monophosphorylated 5′ end, which are typical of processed RNAs.
  • Shredding: TEX degrades RNA molecules that have a 5′ monophosphate. TEX-treated sample becomes enriched in primary transcripts (those with a 5′ triphosphate), whereas secondary processed transcripts are diminished.

image

Figure 1: Comparison of the two RNA-seq methods Cappable-seq and dRNA-seq referring to enrichment processes.