2. Clean HiFi reads - USDA-ARS-GBRU/Pepper_TrioBinning GitHub Wiki

Get the software

HiFiAdapterFilt software is from https://github.com/sheinasim/HiFiAdapterFilt

Navigate to your software directory, load the git software package, and clone the repository

cd /project/pepper/software
module load git
git clone https://github.com/sheinasim/HiFiAdapterFilt

Raw data files

m54334U_210619_035502.ccs.bam
m54334U_210620_102256.ccs.bam
m54334U_210626_080920.ccs.bam
m54334U_210809_192015.hifi_reads.bam
m54334U_210811_041717.hifi_reads.bam
m54334U_210817_204240.hifi_reads.bam
m54334U_210819_043653.hifi_reads.bam

Clean the reads

Then in the same directory as your HiFi reads, run this script

Input reads

#!/bin/sh
#SBATCH --job-name="HiFiAdaptFilt_"
#SBATCH -p mem
#SBATCH --mem=10G
#SBATCH -n 8
#SBATCH --output="%x_%j.o" # job standard output file (%j replaced by job id)
#SBATCH --error="%x_%j.e" # job standard error file (%j replaced by job id)

module load bamtools blast+

export PATH=$PATH:/project/pepper/software/HiFiAdapterFilt
export PATH=$PATH:/project/pepper/software/HiFiAdapterFilt/DB

sh hifiadapterfilt.sh -p m54 -t 8

Notes

Must run script in same directory as the bam files. You cannot point to a different directory where files are located
Do not include file extension (i.e. '.bam')
Trying to point the output to a subdirectory also causes an error. Just run script in same directory as bam files and don't specify output.

Output files

For each input bam file we get out a filtered fastq.gz, a .contaminant.blastout summary and a .stats file telling us how many reads in that file were filtered.

m54334U_210619_035502.filt.fastq.gz
m54334U_210619_035502.ccs.contaminant.blastout
m54334U_210619_035502.ccs.stats

For our HiFi reads, about 0.05% of the reads had adapter sequences (~1 - 1.5k reads). The software removes contaminated reads instead of trimming off adapters.