Variant Calling - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki

Variant Calling & VCF Processing

  • Overview
    What is variant calling, types of variants, and why we use VCFs

  • Calling Variants

    • Pileup‐based callers (bcftools mpileup + call)
    • Haplotype‐aware callers (GATK HaplotypeCaller, FreeBayes)
    • Long‐read callers (Medaka, Clair3)
  • VCF Format Deep Dive

    • Mandatory columns (CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT)
    • INFO and FORMAT subfields
    • Genotype encoding (GT, DP, GQ)
  • Basic VCF Operations (bcftools & vcftools)

    • View, filter & query (bcftools view, vcftools --recode)
    • Sort & index (bcftools sort, bcftools index)
    • Summary stats (bcftools stats, vcftools --freq)
  • Variant Filtering Strategies

    • Hard filters by QUAL, DP, MQ, etc.
    • Variant Quality Score Recalibration (VQSR) in GATK
    • Filtering by functional impact
  • Annotation & Enrichment

    • SnpEff / VEP for effect prediction
    • Adding population frequency and clinical data
  • Merging & Comparing VCFs

    • Multi‐sample merging (bcftools merge)
    • Intersection & differences (bcftools isec)
    • Concordance metrics
  • Population‐Level Analyses

    • Allele frequency spectrum (vcftools --freq)
    • Linkage disequilibrium (vcftools --geno-r2)
  • VCF Visualization

    • Loading VCF tracks in IGV
    • Plotting variant density or impact in R
  • Hands‐On Exercise

    1. Call SNPs and indels on your assembled/simulated dataset
    2. Convert, filter & index the VCF
    3. Annotate with SnpEff and summarize key variant metrics