Read Mapping - erinvanberkel/EPP622-Test-2 GitHub Wiki

BWA

Linking trimmed samples to bwa folder

ln -s ../2_skewer/SRR6922236_1_trimmed-trimmed.fastq

spack load bwa
spack load [email protected]

Performing actual bwa analysis and converting the files to bam format

bwa mem -t 3 \
solenopsis_genome_index/solenopsis_invicta_genome.fa.gz \
SRR6922236_1_trimmed-trimmed.fastq \
| samtools sort -@ 3 -m 4G \
-o SRR6922236_1_sorted.bam

Pulling out the stats from the output sorted bam to view primary and supplemental mapping numbers

samtools flagstat SRR6922236_1_sorted.bam > SRR6922236_1_sorted.bam.stats
head SRR6922236_1_sorted.bam.stats

So the percentage of read maps is super high-99.18%. To make this number match the number from the original fastq, I'm using bc.

spack load bc%[email protected]
echo $(cat SRR6922236_1_trimmed-trimmed.fastq | wc -l)/4 | bc

Then to find the actual percentage of mapped reads you divide the number of reads from flagstat from the number from this command.

Add/Replace Read Groups

/pickett_shared/software/gatk-4.2.6.1/gatk \
    AddOrReplaceReadGroups \
    --INPUT SRR6922236_1_sorted.bam \
    --OUTPUT SRR6922236_1_sorted.RG.bam \
    --RGSM SRR6922236_1 \
    --RGLB SRR6922236_1 \
    --RGPL ILLUMINA \
    --RGPU SRR6922236_1

Samtools index of the read groups

spack load [email protected]
samtools index SRR6922236_1_sorted.RG.bam