Post‐Alignment QC - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
4.5 Post-Alignment QC
Once your reads are aligned (or pseudo-aligned), you should verify that the alignments make biological sense and are free of technical artefacts. Two widely used toolkits are RSeQC and Qualimap RNA-seq.
4.5.1 Tools & Installation
# via Bioconda
conda install -c bioconda rseqc qualimap
- RSeQC: a collection of Python scripts for RNA-seq BAM QC
- Qualimap RNA-seq: Java-based GUI/CLI for comprehensive alignment QC
4.5.2 RSeQC Analyses
- Prepare a BED file of gene bodies
# Convert your GTF → BED (if you only have GTF)
gtf2bed < ref/annotations.gtf > ref/annotations.bed
2.** Gene-Body Coverage**
mkdir -p qc/rseqc/geneBody
geneBody_coverage.py \
-r ref/annotations.bed \
-i align/star/SampleA.Aligned.sortedByCoord.out.bam \
-o qc/rseqc/geneBody/SampleA
- Output:
SampleA.geneBodyCoverage.txt
(per-bin coverage across 5′→3′)SampleA.geneBodyCoverage.pdf
(plot)
- What to look for: flat curve (∼uniform coverage) indicates no 3′ or 5′ bias.
3. Inner Distance (Insert Size)
mkdir -p qc/rseqc/innerDist
inner_distance.py \
-i align/star/SampleA.Aligned.sortedByCoord.out.bam \
-r ref/annotations.bed \
-o qc/rseqc/innerDist/SampleA
- Output:
SampleA.innerDistance.txt
SampleA.innerDistance.pdf
- What to look for: insert size distribution matching your library prep (e.g. mean ~200–300 bp).
- Read Distribution
mkdir -p qc/rseqc/readDistr
read_distribution.py \
-i align/star/SampleA.Aligned.sortedByCoord.out.bam \
-r ref/annotations.bed \
> qc/rseqc/readDistr/SampleA.readDist.txt
- Output: a text summary of reads mapping to exons, introns, intergenic regions
- What to look for: high % in exons (> 70 %) indicates good enrichment for mRNA.
4.5.3 Qualimap RNA-seq Module
- Run Qualimap
mkdir -p qc/qualimap/SampleA
qualimap rnaseq \
-bam align/star/SampleA.Aligned.sortedByCoord.out.bam \
-gtf ref/annotations.gtf \
-outdir qc/qualimap/SampleA \
-pe # include this flag for paired-end libraries
2. Review the HTML report
- Open
qc/qualimap/SampleA/genome_results.html
in your browser - Key sections:
- General statistics (mapping %, coverage)
- Coverage histogram (coverage depth distribution)
- Gene body coverage (another perspective on 5′→3′ bias)
- GC content of aligned reads
4.5.4 Interpreting QC Metrics
QC Metric | Good Practice | Warning Signs |
---|---|---|
Uniform gene-body coverage | Flat 5′→3′ profile | 5′ or 3′ bias (degradation) |
Insert-size distribution | Sharp peak at library mean | Broad/shifted (library prep issues) |
Exonic read fraction | ≥ 70 % exonic | High intronic/intergenic noise |
Mapping rate | ≥ 80 % uniquely mapped | Low (< 70 %) → possible contamination |
GC bias | Matches expected GC content | Skewed distribution |
After confirming satisfactory QC, you can confidently proceed to Quantification & Count Matrix Generation.