Raw Read Quality Control - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

4.3 Raw Read Quality Control

Before alignment, it’s critical to assess the quality of your raw FASTQ files. We’ll generate per-sample reports with FastQC and then combine them with MultiQC for a cohort-wide view.

4.3.1 Tools & Installation

# Install via Bioconda
conda install -c bioconda fastqc multiqc

4.3.2 Run FastQC

# Create an output directory
mkdir -p qc/fastqc

# Run FastQC on all paired-end FASTQs (adjust threads as needed)
fastqc \
  --threads 4 \
  --outdir qc/fastqc \
  raw_data/*_R1.fastq.gz \
  raw_data/*_R2.fastq.gz

Outputs (for each sample/read):

qc/fastqc/SampleA_R1_fastqc.html  
qc/fastqc/SampleA_R1_fastqc.zip  
qc/fastqc/SampleA_R2_fastqc.html  
qc/fastqc/SampleA_R2_fastqc.zip  

Each HTML report includes modules such as:

Module What it shows
Per base sequence quality Boxplots of quality scores at each read position
Per sequence quality scores Overall read quality distribution
Per base GC content GC% by position vs. theoretical
Adapter content Proportion of reads with adapter contamination
Sequence duplication levels Fraction of duplicated reads
Overrepresented sequences Highly abundant sequences (possible contaminants)

Quick check: open one HTML report (e.g. qc/fastqc/SampleA_R1_fastqc.html) in your browser to inspect quality drops, adapters, or unusual GC peaks.

4.3.3 Aggregate with MultiQC

Rather than opening dozens of reports, MultiQC will collate them into a single dashboard:

# Create MultiQC output dir
mkdir -p qc/multiqc

# Aggregate all FastQC results
multiqc \
  --outdir qc/multiqc \
  qc/fastqc

Output

qc/multiqc/multiqc_report.html

That HTML includes:

  • Combined per-base quality heatmap
  • Adapter content barplot across samples
  • Per-sample QC summary table (total reads, GC%, duplication)
  • Easy filtering and navigation between samples

Tip: In the MultiQC report, look for:

  • “Samples with a large fraction of bases < Q20 (poor quality)”
  • “High adapter contamination (>5%)”
  • “Deviations in GC content (may indicate contamination)”

4.3.4 Next Steps

  • If any samples show poor quality or high adapter content, proceed to adapter/quality trimming (e.g. Cutadapt).
  • Once trimming is complete, re-run FastQC/MultiQC on the trimmed FASTQs to confirm improvements before moving on to alignment.