Adapter and Quality Trimming - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
4.3.4 Adapter & Quality Trimming
Before alignment, itβs best practice to remove sequencing adapters and low-quality bases. Trimming improves mapping rates and reduces false positives.
Tools & Installation
# via Bioconda
conda install -c bioconda cutadapt trimmomatic
Cutadapt (Python)
# Paired-end trimming example
cutadapt \
-a AGATCGGAAGAGC \ # 3' adapter for R1
-A AGATCGGAAGAGC \ # 3' adapter for R2
-q 20,20 \ # trim low-quality bases (Q<20) at both ends
--minimum-length 30 \ # drop reads shorter than 30 nt after trimming
-o trimmed/SampleA_R1.trimmed.fastq.gz \
-p trimmed/SampleA_R2.trimmed.fastq.gz \
raw_data/SampleA_R1.fastq.gz \
raw_data/SampleA_R2.fastq.gz \
> qc/cutadapt/SampleA.cutadapt.log
Typical cutadapt log (qc/cutadapt/SampleA.cutadapt.log):
This is cutadapt (cutadapt 3.5)
Command line parameters: -a AGATCGGAAGAGC -A AGATCGGAAGAGC -q 20,20 --minimum-length 30 β¦
Processing reads on 1 core in paired-end mode ...
Finished in 00:01:12
=== Summary ===
Total read pairs processed: 10,000,000
Read 1 with adapter: 4,500,000 (45.0%)
Read 2 with adapter: 4,450,000 (44.5%)
Both reads too short: 200,000 (2.0%)
Pairs kept: 9,300,000 (93.0%)
Total basepairs processed: 3,000,000,000 bp
Quality-trimmed: 150,000,000 bp (5.0%)
-a
/-A
: 3β² adapter sequences-q 20,20
: trim bases with Q-score < 20 from both ends--minimum-length
: discard too-short reads
Trimmomatic (Java)
# Paired-end trimming example
trimmomatic PE \
-threads 4 \
raw_data/SampleA_R1.fastq.gz raw_data/SampleA_R2.fastq.gz \
trimmed/SampleA_R1.paired.fastq.gz trimmed/SampleA_R1.unpaired.fastq.gz \
trimmed/SampleA_R2.paired.fastq.gz trimmed/SampleA_R2.unpaired.fastq.gz \
ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
ILLUMINACLIP:β¦
β adapter clipping (seed mismatches:2, palindrome clip threshold:30, simple clip threshold:10)LEADING
/TRAILING
β drop bases below Q3 at endsSLIDINGWINDOW:4:15
β scan with 4-base window, cut when average Q < 15MINLEN:36
β discard reads < 36 nt
Example Trimmomatic stdout:
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)
TrimmomaticPE: Started with arguments:
β¦ raw_data/SampleA_R1.fastq.gz β¦ ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 β¦
Processing paired-end reads:
Input Read Pairs: 10,000,000
Both Surviving: 9,200,000 (92.0%)
Forward Only Surviving: 300,000 (3.0%)
Reverse Only Surviving: 250,000 (2.5%)
Dropped: 250,000 (2.5%)
TrimmomaticPE: Completed successfully
Next Steps
- Re-run FastQC/MultiQC on
trimmed/
to confirm adapter removal and quality improvement. - Proceed to Read Alignment (
STAR
/HISAT2
orSalmon
/Kallisto
).