Adapter and Quality Trimming - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

4.3.4 Adapter & Quality Trimming

Before alignment, it’s best practice to remove sequencing adapters and low-quality bases. Trimming improves mapping rates and reduces false positives.

Tools & Installation

# via Bioconda
conda install -c bioconda cutadapt trimmomatic

Cutadapt (Python)

# Paired-end trimming example
cutadapt \
  -a AGATCGGAAGAGC \            # 3' adapter for R1
  -A AGATCGGAAGAGC \            # 3' adapter for R2
  -q 20,20 \                    # trim low-quality bases (Q<20) at both ends
  --minimum-length 30 \         # drop reads shorter than 30 nt after trimming
  -o trimmed/SampleA_R1.trimmed.fastq.gz \
  -p trimmed/SampleA_R2.trimmed.fastq.gz \
  raw_data/SampleA_R1.fastq.gz \
  raw_data/SampleA_R2.fastq.gz \
  > qc/cutadapt/SampleA.cutadapt.log

Typical cutadapt log (qc/cutadapt/SampleA.cutadapt.log):

This is cutadapt (cutadapt 3.5)
Command line parameters: -a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20,20 --minimum-length 30 …
Processing reads on 1 core in paired-end mode ...
Finished in 00:01:12
=== Summary ===
Total read pairs processed:            10,000,000
  Read 1 with adapter:                 4,500,000 (45.0%)
  Read 2 with adapter:                 4,450,000 (44.5%)
  Both reads too short:                  200,000 (2.0%)
Pairs kept:                            9,300,000 (93.0%)
Total basepairs processed:          3,000,000,000 bp
  Quality-trimmed:                    150,000,000 bp (5.0%)
  • -a / -A : 3β€² adapter sequences
  • -q 20,20 : trim bases with Q-score < 20 from both ends
  • --minimum-length : discard too-short reads

Trimmomatic (Java)

# Paired-end trimming example
trimmomatic PE \
  -threads 4 \
  raw_data/SampleA_R1.fastq.gz raw_data/SampleA_R2.fastq.gz \
  trimmed/SampleA_R1.paired.fastq.gz trimmed/SampleA_R1.unpaired.fastq.gz \
  trimmed/SampleA_R2.paired.fastq.gz trimmed/SampleA_R2.unpaired.fastq.gz \
  ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
  • ILLUMINACLIP:… – adapter clipping (seed mismatches:2, palindrome clip threshold:30, simple clip threshold:10)
  • LEADING / TRAILING – drop bases below Q3 at ends
  • SLIDINGWINDOW:4:15 – scan with 4-base window, cut when average Q < 15
  • MINLEN:36 – discard reads < 36 nt

Example Trimmomatic stdout:

Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)
TrimmomaticPE: Started with arguments:
  … raw_data/SampleA_R1.fastq.gz … ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 …
Processing paired-end reads:
  Input Read Pairs:          10,000,000
  Both Surviving:            9,200,000 (92.0%)
  Forward Only Surviving:    300,000 (3.0%)
  Reverse Only Surviving:    250,000 (2.5%)
  Dropped:                   250,000 (2.5%)
TrimmomaticPE: Completed successfully

Next Steps

  1. Re-run FastQC/MultiQC on trimmed/ to confirm adapter removal and quality improvement.
  2. Proceed to Read Alignment (STAR/HISAT2 or Salmon/Kallisto).