Variant Calling - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
Variant Calling & VCF Processing
-
Overview
What is variant calling, types of variants, and why we use VCFs -
- Pileup‐based callers (bcftools mpileup + call)
- Haplotype‐aware callers (GATK HaplotypeCaller, FreeBayes)
- Long‐read callers (Medaka, Clair3)
-
- Mandatory columns (CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT)
- INFO and FORMAT subfields
- Genotype encoding (GT, DP, GQ)
-
Basic VCF Operations (bcftools & vcftools)
- View, filter & query (
bcftools view
,vcftools --recode
) - Sort & index (
bcftools sort
,bcftools index
) - Summary stats (
bcftools stats
,vcftools --freq
)
- View, filter & query (
-
- Hard filters by QUAL, DP, MQ, etc.
- Variant Quality Score Recalibration (VQSR) in GATK
- Filtering by functional impact
-
- SnpEff / VEP for effect prediction
- Adding population frequency and clinical data
-
- Multi‐sample merging (
bcftools merge
) - Intersection & differences (
bcftools isec
) - Concordance metrics
- Multi‐sample merging (
-
- Allele frequency spectrum (
vcftools --freq
) - Linkage disequilibrium (
vcftools --geno-r2
)
- Allele frequency spectrum (
-
- Loading VCF tracks in IGV
- Plotting variant density or impact in R
-
- Call SNPs and indels on your assembled/simulated dataset
- Convert, filter & index the VCF
- Annotate with SnpEff and summarize key variant metrics