Raw Read Quality Control - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
4.3 Raw Read Quality Control
Before alignment, it’s critical to assess the quality of your raw FASTQ files. We’ll generate per-sample reports with FastQC and then combine them with MultiQC for a cohort-wide view.
4.3.1 Tools & Installation
# Install via Bioconda
conda install -c bioconda fastqc multiqc
4.3.2 Run FastQC
# Create an output directory
mkdir -p qc/fastqc
# Run FastQC on all paired-end FASTQs (adjust threads as needed)
fastqc \
--threads 4 \
--outdir qc/fastqc \
raw_data/*_R1.fastq.gz \
raw_data/*_R2.fastq.gz
Outputs (for each sample/read):
qc/fastqc/SampleA_R1_fastqc.html
qc/fastqc/SampleA_R1_fastqc.zip
qc/fastqc/SampleA_R2_fastqc.html
qc/fastqc/SampleA_R2_fastqc.zip
Each HTML report includes modules such as:
Module | What it shows |
---|---|
Per base sequence quality | Boxplots of quality scores at each read position |
Per sequence quality scores | Overall read quality distribution |
Per base GC content | GC% by position vs. theoretical |
Adapter content | Proportion of reads with adapter contamination |
Sequence duplication levels | Fraction of duplicated reads |
Overrepresented sequences | Highly abundant sequences (possible contaminants) |
Quick check: open one HTML report (e.g.
qc/fastqc/SampleA_R1_fastqc.html
) in your browser to inspect quality drops, adapters, or unusual GC peaks.
4.3.3 Aggregate with MultiQC
Rather than opening dozens of reports, MultiQC will collate them into a single dashboard:
# Create MultiQC output dir
mkdir -p qc/multiqc
# Aggregate all FastQC results
multiqc \
--outdir qc/multiqc \
qc/fastqc
Output
qc/multiqc/multiqc_report.html
That HTML includes:
- Combined per-base quality heatmap
- Adapter content barplot across samples
- Per-sample QC summary table (total reads, GC%, duplication)
- Easy filtering and navigation between samples
Tip: In the MultiQC report, look for:
- “Samples with a large fraction of bases < Q20 (poor quality)”
- “High adapter contamination (>5%)”
- “Deviations in GC content (may indicate contamination)”
4.3.4 Next Steps
- If any samples show poor quality or high adapter content, proceed to adapter/quality trimming (e.g. Cutadapt).
- Once trimming is complete, re-run FastQC/MultiQC on the trimmed FASTQs to confirm improvements before moving on to alignment.