RNA Seq Pipeline - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

Under RNA-Seq Pipeline we’ll cover each major processing step from raw FASTQ to a gene-level count matrix:

RNA-Seq Pipeline

  1. Experimental Design & Metadata
    – Sample grouping, replicates, batch considerations
  2. Data Organization & Download
    – Directory structure, SRA/ENA fetch (e.g. fasterq-dump)
  3. Raw Read Quality Control
    – FastQC reports
    – Aggregation with MultiQC
  4. Adapter & Quality Trimming
    – Cutadapt or Trimmomatic usage
  5. Read Alignment or Pseudo-alignment
    • Genome-based: STAR or HISAT2
    • Alignment-free: Salmon or Kallisto
  6. Post-Alignment QC
    – RSeQC (e.g. gene-body coverage, inner distance)
    – Qualimap RNA-seq module
  7. Quantification & Count Matrix Generation
    – featureCounts or HTSeq-count for aligned BAMs
    – tximport for transcript‐to‐gene summarization
  8. Normalization & Exploratory Analysis
    – TPM/CPM calculation
    – PCA, hierarchical clustering
  9. Workflow Management & Reproducibility
    – Snakemake or Nextflow pipeline templates
    – Containerization (Docker/Singularity)
    This roadmap will guide our hands-on examples, code snippets, and recommended tools for each step.