Calling Variants - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
Calling Variants
Pileup-based callers (bcftools mpileup + call)
- Exit your InSilicoSeq virtual environment (if youโre still in it):
deactivate
-
Install bcftools and tabix (needed to call and index VCFs):
sudo apt update sudo apt install -y bcftools tabix
-
Re-enter your project directory and (re)activate your ISS env:
cd /mnt/c/Users/IAGhe/OneDrive/Documents/Bioinformatics source iss_env/bin/activate
-
Run the pileup โ call pipeline:
bcftools mpileup \ -f ref_genome.fa \ -Ou \ aln.filtered.bam \ | bcftools call \ -mv \ -Oz \ -o raw_snps_indels.vcf.gz
-
Index the VCF (for fast querying):
tabix -p vcf raw_snps_indels.vcf.gz
-
Take a peek at your variants:
bcftools view raw_snps_indels.vcf.gz | head -n 30
-
(Optional) View in a nice column format:
bcftools view raw_snps_indels.vcf.gz \ | column -t \ | less -S
All of the above is adapted from the InSilicoSeq hands-on demo.
Haplotype-aware callers
- GATK HaplotypeCaller
# Requires GATK installed (via Conda or you can download the jar)
gatk HaplotypeCaller \
-R ref_genome.fa \
-I aln.filtered.bam \
-O gatk_variants.vcf.gz
- FreeBayes
# A lightweight, multiallelic haplotype caller
freebayes \
-f ref_genome.fa \
aln.filtered.bam \
> freebayes_variants.vcf
(Commands to run these would follow a similar pattern: index the BAM, then invoke the caller with -R ref_genome.fa
, -I aln.sorted.bam
, and appropriate output flags.)
Long-read callers
- Medaka (for ONT data)
# After mapping ONT reads with minimap2
medaka_variant \
-i ont_reads.fastq.gz \
-d ref_genome.fa \
-b ont_aln.bam \
-o medaka_variants
- Clair3
# Clair3 Docker or conda install; specify platform 'ont' or 'hifi'
clair3.sh \
--bam ont_aln.bam \
--ref ref_genome.fa \
--threads 8 \
--platform ont \
--output clair3_out
(These tools take long-read alignments (e.g. minimap2 โ BAM), then call variants using neural-network models trained on long-read error profiles.)