Read Mapping - erinvanberkel/EPP622-Test-2 GitHub Wiki
BWA
Linking trimmed samples to bwa folder
ln -s ../2_skewer/SRR6922236_1_trimmed-trimmed.fastq
spack load bwa
spack load [email protected]
Performing actual bwa analysis and converting the files to bam format
bwa mem -t 3 \
solenopsis_genome_index/solenopsis_invicta_genome.fa.gz \
SRR6922236_1_trimmed-trimmed.fastq \
| samtools sort -@ 3 -m 4G \
-o SRR6922236_1_sorted.bam
Pulling out the stats from the output sorted bam to view primary and supplemental mapping numbers
samtools flagstat SRR6922236_1_sorted.bam > SRR6922236_1_sorted.bam.stats
head SRR6922236_1_sorted.bam.stats
So the percentage of read maps is super high-99.18%. To make this number match the number from the original fastq, I'm using bc.
spack load bc%[email protected]
echo $(cat SRR6922236_1_trimmed-trimmed.fastq | wc -l)/4 | bc
Then to find the actual percentage of mapped reads you divide the number of reads from flagstat from the number from this command.
Add/Replace Read Groups
/pickett_shared/software/gatk-4.2.6.1/gatk \
AddOrReplaceReadGroups \
--INPUT SRR6922236_1_sorted.bam \
--OUTPUT SRR6922236_1_sorted.RG.bam \
--RGSM SRR6922236_1 \
--RGLB SRR6922236_1 \
--RGPL ILLUMINA \
--RGPU SRR6922236_1
Samtools index of the read groups
spack load [email protected]
samtools index SRR6922236_1_sorted.RG.bam