Trimmomatic - NBISweden/workshop-genome_assembly GitHub Wiki
Trimmomatic: A general purpose trimming tool.
Notes:
- On UPPMAX, Trimmomatic files are found in
$TRIMMOMATIC_HOME
- Check you're using the correct adapter file if you're trimming for adapters (
ls $TRIMMOMATIC_HOME/adapters
). - The tool
bbmerge
in the modulebbmap
can be used to discover adapters. See the workshop exercises for more details.
Command:
#!/usr/bin/env bash
module load bioinfo-tools trimmomatic/0.36
CPUS="${SLURM_NPROCS:-8}"
JOB=$SLURM_ARRAY_TASK_ID
DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )
apply_trimmomatic () {
READ1="$1" # Read 1 of the read pair to be screened
READ2="$2" # Read 2 of the read pair to be screened
if [ "$READ1" == "$READ2" ]; then
>&2 echo "READ1 and READ2 are the same file. R2 Pattern replacement failed. Please check string substitution pattern lower down"
exit 2
fi
PREFIX=$(basename "${READ1%_R1*}")
java -jar $TRIMMOMATIC_HOME/trimmomatic-0.36.jar PE -threads "$CPUS" "$READ1" "$READ2" \
"${PREFIX}_clean_paired_1.fastq.gz" "${PREFIX}_clean_unparied_1.fastq.gz" \
"${PREFIX}_clean_paired_2.fastq.gz" "${PREFIX}_clean_unparied_2.fastq.gz" \
ILLUMINACLIP:$TRIMMOMATIC_HOME/adapters/TruSeq3-PE-2.fa:2:30:10
}
FASTQ="${FILES[$JOB]}"
apply_trimmomatic "$FASTQ" "${FASTQ/_R1./_R2.}"
Visualize adapters
It can be good habit to check your adapter sequence will be found in your sequence. Here we're assuming both
adapters contain the sequence AGATCGGAAGAGC
(TruSeq3)
paste <( zcat $READ1 ) <( zcat $READ2 ) | grep -A2 -B1 --colour=always "AGATCGGAAGAGC" | less -SR