GATK - erinvanberkel/EPP622-Test-2 GitHub Wiki

Making a new directory for GATK

mkdir 4_gatk

Linking the RG BAM and the bam index from the last page

ln -s $(readlink -e ../3_bwa/*_sorted.RG.*) ./

Haplotype Caller

/pickett_shared/software/gatk-4.2.6.1/gatk \
--java-options "-Xmx4G" \
HaplotypeCaller \
-R solenopsis_invicta_genome.fa.gz \
-I SRR6922236_1_sorted.RG.bam \
-O SRR6922236_1_NC_052664.1.vcf \
-bamout SRR6922236_1_sorted_NC_052664.1.RG.realigned.bam \
-L NC_052664.1

Calling variants on one sample at a time at chromosome NC_052664.1.

If you want to visualize a few of the variants after the header information to check.

grep -v '^##' SRR6922141_1_NC_052664.1.vcf | head -n 5

Count the number of snps and indels. Grabbing the SNPs or indels and counting the number of lines.

spack load bcftools
bcftools view -v snps SRR6922141_1_NC_052664.1.vcf | grep -v "^#" | wc -l
bcftools view -v indels SRR6922141_1_NC_052664.1.vcf | grep -v "^#" | wc -l

Download files to your computer for IGV visualization

scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\solenopsis_invicta_genome.fa .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\solenopsis_invicta_genome.fa.fai .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\SRR6922141_1_sorted.RG.bam .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\SRR6922141_1_sorted.RG.bam.bai .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\SRR6922141_1_sorted_NC_052664.1.RG.realigned.bam .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\SRR6922141_1_sorted_NC_052664.1.RG.realigned.bai .
scp [email protected]:/pickett_sphinx/teaching/EPP622_2024/test2/analysis/evanberk/4_gatk/\SRR6922141_1_NC_052664.1.vcf .