2. Clean HiFi reads - USDA-ARS-GBRU/Pepper_TrioBinning GitHub Wiki
Get the software
HiFiAdapterFilt software is from https://github.com/sheinasim/HiFiAdapterFilt
Navigate to your software directory, load the git software package, and clone the repository
cd /project/pepper/software
module load git
git clone https://github.com/sheinasim/HiFiAdapterFilt
Raw data files
- m54334U_210619_035502.ccs.bam
- m54334U_210620_102256.ccs.bam
- m54334U_210626_080920.ccs.bam
- m54334U_210809_192015.hifi_reads.bam
- m54334U_210811_041717.hifi_reads.bam
- m54334U_210817_204240.hifi_reads.bam
- m54334U_210819_043653.hifi_reads.bam
Clean the reads
Then in the same directory as your HiFi reads, run this script
- Input reads
#!/bin/sh
#SBATCH --job-name="HiFiAdaptFilt_"
#SBATCH -p mem
#SBATCH --mem=10G
#SBATCH -n 8
#SBATCH --output="%x_%j.o" # job standard output file (%j replaced by job id)
#SBATCH --error="%x_%j.e" # job standard error file (%j replaced by job id)
module load bamtools blast+
export PATH=$PATH:/project/pepper/software/HiFiAdapterFilt
export PATH=$PATH:/project/pepper/software/HiFiAdapterFilt/DB
sh hifiadapterfilt.sh -p m54 -t 8
Notes
- Must run script in same directory as the bam files. You cannot point to a different directory where files are located
- Do not include file extension (i.e. '.bam')
- Trying to point the output to a subdirectory also causes an error. Just run script in same directory as bam files and don't specify output.
Output files
For each input bam file we get out a filtered fastq.gz, a .contaminant.blastout summary and a .stats file telling us how many reads in that file were filtered.
- m54334U_210619_035502.filt.fastq.gz
- m54334U_210619_035502.ccs.contaminant.blastout
- m54334U_210619_035502.ccs.stats
For our HiFi reads, about 0.05% of the reads had adapter sequences (~1 - 1.5k reads). The software removes contaminated reads instead of trimming off adapters.