RNA Seq Pipeline - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
Under RNA-Seq Pipeline we’ll cover each major processing step from raw FASTQ to a gene-level count matrix:
RNA-Seq Pipeline
- Experimental Design & Metadata
– Sample grouping, replicates, batch considerations - Data Organization & Download
– Directory structure, SRA/ENA fetch (e.g.fasterq-dump
) - Raw Read Quality Control
– FastQC reports
– Aggregation with MultiQC - Adapter & Quality Trimming
– Cutadapt or Trimmomatic usage - Read Alignment or Pseudo-alignment
- Genome-based: STAR or HISAT2
- Alignment-free: Salmon or Kallisto
- Post-Alignment QC
– RSeQC (e.g. gene-body coverage, inner distance)
– Qualimap RNA-seq module - Quantification & Count Matrix Generation
– featureCounts or HTSeq-count for aligned BAMs
– tximport for transcript‐to‐gene summarization - Normalization & Exploratory Analysis
– TPM/CPM calculation
– PCA, hierarchical clustering - Workflow Management & Reproducibility
– Snakemake or Nextflow pipeline templates
– Containerization (Docker/Singularity)
This roadmap will guide our hands-on examples, code snippets, and recommended tools for each step.