Post‐Alignment QC - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

4.5 Post-Alignment QC

Once your reads are aligned (or pseudo-aligned), you should verify that the alignments make biological sense and are free of technical artefacts. Two widely used toolkits are RSeQC and Qualimap RNA-seq.

4.5.1 Tools & Installation

# via Bioconda
conda install -c bioconda rseqc qualimap
  • RSeQC: a collection of Python scripts for RNA-seq BAM QC
  • Qualimap RNA-seq: Java-based GUI/CLI for comprehensive alignment QC

4.5.2 RSeQC Analyses

  1. Prepare a BED file of gene bodies
# Convert your GTF → BED (if you only have GTF)
 gtf2bed < ref/annotations.gtf > ref/annotations.bed

2.** Gene-Body Coverage**

mkdir -p qc/rseqc/geneBody
geneBody_coverage.py \
  -r ref/annotations.bed \
  -i align/star/SampleA.Aligned.sortedByCoord.out.bam \
  -o qc/rseqc/geneBody/SampleA
  • Output:
    • SampleA.geneBodyCoverage.txt (per-bin coverage across 5′→3′)
    • SampleA.geneBodyCoverage.pdf (plot)
  • What to look for: flat curve (∼uniform coverage) indicates no 3′ or 5′ bias.

3. Inner Distance (Insert Size)

mkdir -p qc/rseqc/innerDist
inner_distance.py \
  -i align/star/SampleA.Aligned.sortedByCoord.out.bam \
  -r ref/annotations.bed \
  -o qc/rseqc/innerDist/SampleA
  • Output:
    • SampleA.innerDistance.txt
    • SampleA.innerDistance.pdf
  • What to look for: insert size distribution matching your library prep (e.g. mean ~200–300 bp).

  1. Read Distribution
mkdir -p qc/rseqc/readDistr
read_distribution.py \
  -i align/star/SampleA.Aligned.sortedByCoord.out.bam \
  -r ref/annotations.bed \
  > qc/rseqc/readDistr/SampleA.readDist.txt
  • Output: a text summary of reads mapping to exons, introns, intergenic regions
  • What to look for: high % in exons (> 70 %) indicates good enrichment for mRNA.

4.5.3 Qualimap RNA-seq Module

  1. Run Qualimap
   mkdir -p qc/qualimap/SampleA
   qualimap rnaseq \
     -bam align/star/SampleA.Aligned.sortedByCoord.out.bam \
     -gtf ref/annotations.gtf \
     -outdir qc/qualimap/SampleA \
     -pe    # include this flag for paired-end libraries

2. Review the HTML report

  • Open qc/qualimap/SampleA/genome_results.html in your browser
  • Key sections:
    • General statistics (mapping %, coverage)
    • Coverage histogram (coverage depth distribution)
    • Gene body coverage (another perspective on 5′→3′ bias)
    • GC content of aligned reads

4.5.4 Interpreting QC Metrics

QC Metric Good Practice Warning Signs
Uniform gene-body coverage Flat 5′→3′ profile 5′ or 3′ bias (degradation)
Insert-size distribution Sharp peak at library mean Broad/shifted (library prep issues)
Exonic read fraction ≥ 70 % exonic High intronic/intergenic noise
Mapping rate ≥ 80 % uniquely mapped Low (< 70 %) → possible contamination
GC bias Matches expected GC content Skewed distribution

After confirming satisfactory QC, you can confidently proceed to Quantification & Count Matrix Generation.