Trimmomatic - NBISweden/workshop-genome_assembly GitHub Wiki

Trimmomatic: A general purpose trimming tool.

Notes:

  • On UPPMAX, Trimmomatic files are found in $TRIMMOMATIC_HOME
  • Check you're using the correct adapter file if you're trimming for adapters (ls $TRIMMOMATIC_HOME/adapters).
  • The tool bbmerge in the module bbmap can be used to discover adapters. See the workshop exercises for more details.

Command:

#!/usr/bin/env bash

module load bioinfo-tools trimmomatic/0.36

CPUS="${SLURM_NPROCS:-8}"
JOB=$SLURM_ARRAY_TASK_ID

DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )

apply_trimmomatic () {
	READ1="$1"      # Read 1 of the read pair to be screened
	READ2="$2"      # Read 2 of the read pair to be screened
	if [ "$READ1" == "$READ2" ]; then
		>&2 echo "READ1 and READ2 are the same file. R2 Pattern replacement failed. Please check string substitution pattern lower down"
		exit 2
	fi
	PREFIX=$(basename "${READ1%_R1*}")
	java -jar $TRIMMOMATIC_HOME/trimmomatic-0.36.jar PE -threads "$CPUS" "$READ1" "$READ2" \
	"${PREFIX}_clean_paired_1.fastq.gz" "${PREFIX}_clean_unparied_1.fastq.gz" \
	"${PREFIX}_clean_paired_2.fastq.gz" "${PREFIX}_clean_unparied_2.fastq.gz" \
	ILLUMINACLIP:$TRIMMOMATIC_HOME/adapters/TruSeq3-PE-2.fa:2:30:10
}

FASTQ="${FILES[$JOB]}"
apply_trimmomatic "$FASTQ" "${FASTQ/_R1./_R2.}"

Visualize adapters

It can be good habit to check your adapter sequence will be found in your sequence. Here we're assuming both adapters contain the sequence AGATCGGAAGAGC (TruSeq3)

paste <( zcat $READ1 ) <( zcat $READ2 ) | grep -A2 -B1 --colour=always "AGATCGGAAGAGC" | less -SR