SNV - deaconjs/ThousandVariantCallersRepo GitHub Wiki
SNP Variant Callers
caller | pubyear | from | study | source | algorithm |
---|---|---|---|---|---|
graphtyper | 2017 | deCODE genetics | study | source | Population-scale genotyping using pangenome graphs |
muse | 2016 | MD Anderson Cancer Center | study | source | F81 Markov Substitution Model |
sinvict | 2016 | Simon Frasiser University, Canada | study | source | |
multigems | 2016 | University of California, Riverside | study | source | Multinomial Bayesian, base and alignment quality priors |
somaticseq | 2015 | Roche Bina | study | source | meta-caller, decision tree |
discosnp | 2015 | Genscale France | study | source | reference-free, de bruijn graph |
2kplus2 | 2015 | Norwich Research Park, UK, Sainsbury lab | study | source | reference-free, de bruijn graph |
exscalibur | 2015 | University of Chicago | study | source | |
multisnv | 2015 | Cambridge Tavare | study | source | joint paired, timepoint pooling |
rarevator | 2015 | University of Florence | study | source | Fisher's exact test, conserved loci only |
snv-ppilp | 2015 | University of Helsinki, Finland | study | source | perfect phylogeny/integer linear programming |
platypus | 2014 | U Oxford | study | source | Haplotype, bayesian, multi-sample, local realignment |
baysic | 2014 | Baylor/Genformatic LLC | study | source | Meta-caller, Bayesian, unsupervised |
hapmuc | 2014 | Kyoto University, Japan | study | source | Haplotype, Bayesian HMM |
snpest | 2014 | U Copenhagen | study | source | reference-free, generative probabilistic |
variantmaster | 2014 | Geneva Medical School, Switzerland | study | source | reference-free, pedigree inference |
mutect | 2013 | Broad Getz | study | source | Beta-binomial, Variable Allele Fraction, filter population SNPs |
niks | 2013 | Max Planck Institute for Plant Breeding Research, Germany | study | source | |
ebcall | 2013 | Vanderbilt Zhao | study | source | Heuristic, multiple feature |
shearwater | 2013 | U Cambridge/Welcome Trust | study | source | Beta-binomial, DeepSNV with aggregate control counts |
shimmer | 2013 | NHGRI Larsen | study | source | Fisher's exact test, variant read count > N |
bubbleparse | 2013 | Norwich Research Park Sainsbury Lab, UK | study | source | Reference-free, de Bruijn graph |
cake | 2013 | Welcome Trust Adams | study | source | Meta-caller, simple 2x consensus, post-filter |
denovogear | 2013 | WashU St Louis Conrad | study | source | Beta-binomial, pedigree |
qsnp | 2013 | U Queensland | study | source | Heuristic, min 3 reads, post-filter |
rvd | 2013 | Stanford University School of Medicine | study | source | Beta-binomial |
seurat | 2013 | Translational Genomics Research Institute | study | source | Joint-paired, beta-binomial |
snptools | 2013 | Baylor College of Medicine | study | source | Haplotype, Bayesian HMM |
vcmm | 2013 | RIKEN Japan | study | source | Multinomial Bayesian, priors corrected Illumina q-score |
vip | 2013 | Case Western, Li lab | study | source | Overlapping Pools |
virmid | 2013 | UCSD Bafna | study | source | Joint-paired, Beta-binomial, purity estimation |
varscan2 | 2012 | WashU St Louis Wilson | study | source | Heuristic, min 3 reads, filter |
jointsnvmix | 2012 | U British Columbia Vancouver | study | source | Joint-paired, Beta-binomial |
lofreq | 2012 | Genome Institute of Singapore | study | source | Joint-paired, Poisson-binomial |
strelka | 2012 | Illumina | study | source | Joint-paired, multinomial Bayesian |
atlas2 | 2012 | Baylor Yu | study | source | Heuristic, reads ratio plus filter |
conan-snv | 2012 | British Columbia Cancer Agency, Canada | study | source | CNV-informed SNV calls |
cortex | 2012 | Welcome Trust, University of Oxford, UK | study | source | Reference-free, de Bruijn |
deepsnv | 2012 | ETH Zurich | study | source | Probabilistic beta-binomial |
gems | 2012 | UCal Riverside | study | source | Multinomial Bayesian, base- and alignment-quality priors |
impute2 | 2012 | University of Chicago | study | source | Haplotype |
somatic_sniper | 2011 | Wash U St Louis Ding | study | source | Joint-paired, multinomial Bayesian |
bambino | 2011 | "NCI L Population Genetics Buetow" | study | source | Heuristic, multiple features |
freebayes | 2011 | Erik Garrison | study | source | Haplotype, multi-allelic, non-uniform copy number |
mutationseq | 2011 | "U British Columbia Vancouver Shah" | study | source | Machine learning |
snver | 2011 | New Jersey I of T | study | source | Overlapping pools |
syzygy | 2011 | broad | study | source | Probabilistic, strand, sequence context, neighborhood quality score priors |
vipr | 2011 | Max Plank Institute | study | source | Overlapping pools |
crisp | 2010 | Scripp's Translational Science Institute, US | study | source | Overlapping pools |
indelocator | CGA/Broad | source |
graphtyper
Validated vs: GATK (UG, UGLite, HC, HC joint), Samtools, Platypus, FreeBayes
Used by: deCODE genetics on WGS of > 28,000 Icelanders
Notes: fast, highly scalable, includes HLA typing
Algorithm: Iterative creation of local pangenome graphs for accurate re-alignment of reads to all possible haplotypes.
muse
Validated vs: mutect, sniper, strelka
Used by: GDC, SomaticSeq
Algorithm: co-local realignment of paired normal/tumor reads, pre-filter, estimate allele equilibrium frequences and evolutionary disance with F81 Markov substitution model, weighs frequencies against sample-specific error model, requires higher stringency at dbSNP locations
Notes: produces somatic calls. should give competitive performance on impure samples
Description: Markov Substitution model for Evolution (MuSE), which models the evolution of the reference allele to the allelic composition of the tumor and normal tissue at each genomic locus. We further adopt a sample-specific error model to identify cutoffs, reflecting the variation in tumor heterogeneity among samples.
sinvict
Validated vs: MuTect, VarScan2, Freebayes
Notes: Captures very low allele frequences, to detect mutations in free floating tumor dna. Can do time-series analysis. No confidence score assigned.
multigems
Validated vs: freebayes, gatk, samtools, varscan
Notes: assumes diploid. Multiple-sample version of GeMS.
Description: estimates sample genotypes and genotype probabilities for possible SNV sites. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. Also, in consideration of the multiple testing problem associated with SNV calling, SNVs are called using a local false discovery rate (lFDR) estimator. Further, MultiGeMS utilizes high performance computing (HPC) techniques for computational efficiency and is robust to low-quality sequencing and alignment data.
somaticseq
Validated vs: mutect/indellocator, varscan2, somaticsniper, jointsnvmix2, vardict
Notes: calls SNV/indel. produces somatic calls. meta caller, AI consensus
Description: Collects up to 72 features per mutation by SAMtools, HaplotypeCaller, and five orthogonal variant callers. The Adaptive Boosting model constructs a decision tree classifier that yields P for each variant.
discosnp
Validated vs: niks, bubbleparse, cortex
Notes: Calls SNV/indel. FastQ input. de novo calls. Uses Cortex. Ranks predictions. Compute efficient.
Description: designed to call isolated SNPs directly from sequenced reads, without a reference genome. ... DISCOSNP finds and ranks high quality isolated heterozygous or homozygous SNPs from any number of read sets, from 1 to n. It introduces new features to distinguish SNPs from sequencing errors or false positives due to approximate repeats. DISCOSNP can be used for finding high-quality isolated SNPs, either heterozygous, e.g. to build databases of high-quality markers within and across populations, or homozygous between individuals/strains, e.g. to create discriminant markers.
2kplus2
Notes: detects SNV/SV. Cortex input. Reference free de novo de Bruijn graph.
Description: begins by producing a tree of kmers for an input read set picking a seed k-mer and assuming that it lies on one path through a SNP and then looks for an opposite kmer, one substitution different, which would lie on another path through the bubble. If this can be found in the k-mer tree, then a recursive algorithm builds paths left and right of each k-mer until they join or no k-mer can be found. Further to graph structure, the attributes of the sample and sampled sequence reads can be used.
exscalibur
Notes: Reports the union of multiple pipelines
Description: WES analysis pipelines for the detection of germline and somatic mutations, with the implementation of three aligners, six germline callers, and six somatic callers. It automates the full analysis workflow from raw sequencing reads to annotated variants and provides an interactive visualization of the results
multisnv
Notes: somatic calls
Validated vs: SomaticSniper, MuTect, UnifiedGenotyper and Platypus
Algorithm: probabilistic, timepoint-pooling, multiple samples from same patient
Description: a somatic variant caller that extends pairwise analysis of tumour-normal pairs to joint analysis of multiple samples from the same patient ... multiSNV calls somatic SNVs across all available same-patient samples without pooling reads. It is based on a Bayesian framework that captures the relatedness between samples by modelling the probability of a mutation in a given sample, conditioned on the somatic status of all other samples.
rarevator
Validated vs: mutect, varscan2
Notes: outputs SNV/indel. filter only, GATK UG input. somatic calls. validation only mentions how many new variants were called by rarevator, not how many were missed
Algorithm: Fisher exact test on conserved loci from hg19
snv-ppilp
Notes: Filter only, gatk ug vcf input.
Algorithm: perfect phylogeny/integer linear programming
Description: a tool for refining GATK’s Unified Genotyper SNV calls for multiple samples. We assume these samples form a character-based phylogeny, the characters being the SNVs reported by GATK. As in Salari et al. (2013), we work with the perfect phylogeny model; however, we have a new problem formulation for fitting GATK’s calls to such a phylogeny, which we solve exactly using integer linear programming (ILP).
platypus
Validated vs: gatk ug/hc, samtools
Compared in: Bro5
Used by: bcbio, bioconda
Notes: docs. calls SNV/SV/indel. somatic calls. haplotype-based. No dependencies, fast. Also see Somatypus.
baysic
Notes: input is paired vcfs only.
Algorithm: meta, unsupervised bayesian consensus
Description: uses a Bayesian statistical method based on latent class analysis to combine variant sets produced by different bioinformatic packages (e.g., GATK, FreeBayes, Samtools) into a high-confidence set of genome variants.
hapmuc
Validated vs: VarScan 2, SomaticSniper, Strelka and MuTect
Notes: SNV/indel. pileups input. somatic calls
Algorithm: bayesian model on haplotype inference
Description: two generative models under a Bayesian statistical framework: one represents true somatic mutations and the other regards candidate somatic mutations as errors. In our generative models, we prepared four candidate haplotypes by combining a candidate mutation and a heterozygous germ line variant, if available. The alignment probabilities of the observed reads given each candidate haplotypewere then computed by using profile hidden Markov models. Next, we inferred the haplotype frequencies and calculated the marginal likelihoods by using a variational Bayesian algorithm. Finally, we derived a Bayes factor, which is the ratio of the marginal likelihoods of these two models, to evaluate the possibility of the presence of somatic mutations.
snpest
Validated vs: GeMS, freebayes, GATK HC, samtools
Notes: SNV/indel. pileups input. confidence score ranks predictions, does not model aneuploidy
Algorithm: reference-free probablistic model, generative probabilistic graphical model
Description: models the genotyping and SNP calling from the raw read sequences in a fully probabilistic framework. The problem is described using a generative probabilistic graphical model
variantmaster
Notes: SNV/indel. somatic, do novo calls
Algorithm: reference-free probiblistic model, inference through inheritance
Description: uses raw sequence data information available in BAM files (binary sequence alignment/map format) in addition to the variants reported in the VCF files. BAM and VCF files can be generated using standard tools such as BWA, SAMtools, or GATK with default parameters. More specifically, for each variant in each affected individual, the algorithm estimates the strand bias and the probability that each family member is a carrier accounting for the respective fraction of supporting reads and the corresponding base call error rate
mutect (1 & 2)
Validated vs: somatic sniper, jointSNVmix, strelka
Used by: GDC, SomaticSeq, bcbio, rave
Compared in: Den9, Wash7, Bcb8, Barc2, Van6, Gor4, Swi9
Algorithm: bayesian with variable allele fraction, filter variants appearing in normal pool unless they are known variants. paired but not joint calling. No confidence score.
Notes: population-based calls. sensitive for low allelic frequency
Description: Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters
niks
Notes: Identifies mutagen-induced mutations in paired samples.
Algorithm: reference free. map all reads to k-mers and count frequency changes between paired samples, reassemble
Description: reference-free genome comparison based solely on the frequencies of short subsequences within whole-genome sequencing data. It is geared toward identifying mutagen-induced, small-scale, homozygous differences between two highly related genomes, independent of their inbred or outbred background, and provides a route to identification of mutations without requiring any prior information about reference sequences or genetic maps
ebcall
Notes: outputs SNV/indel. exome-only? population based calls. doesn't output vcfs. sensitive for low allelic frequency
Validated vs: varscan 2, somatic sniper
Algorithm: heuristic with beta-binomial error model from pooled normal bams, simply subtract germ line variants for somatic
Description: empirically estimating the distribution of sequencing errors by using a set of non-paired normal samples. Using this approach, we can directly evaluate the discrepancy between the observed allele frequencies and the expected scope of sequencing errors
shearwater
Validated vs: caveman, mutect, deepsnv
Notes: population-based calls. for targeted sequencing.
Algorithm: beta-binomial model for variant calling with multiple samples
Description: exploits the power of a large sample set for precisely defining the local error rates and which uses prior information to call variants with high specificity and sensitivity.
shimmer
Validated vs: varscan 2, somatic sniper, deepsnv, jointSNVmix 2
Algorithm: Fisher's exact test with multiple testing correction
Notes: somatic calls. employs a statistical model quite similar to that of Varscan 2, but in addition to this it performs a correction for multiple testing
Description: If the total number of reads displaying a non-reference allele in the two samples is greater than a minimum threshold nvar, a Fisher’s exact test is performed to test the null hypothesis that variant alleles are distributed randomly between the two samples
bubbleparse
Validated vs: cortex, samtools
Notes: Outputs SNV/SV. Cortex input
Description: minimal error-cleaning routine, followed by a depth-first search in the graph to find bubbles.
cake
Notes: somatic calls
Validated vs: bambino, caveman, mpileup, varscan2
Algorithm: meta-caller - merge, consensus, filter.
Description: integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants, Bambino, CaVEMan, SAMtools mpileup, and VarScan 2 with extra filtering
denovogear
Notes: outputs SNV/indel. works with trios. do novo calls.
Validated vs: gatk, polymutt, samtools
Used in: biocondor
Algorithm: joint statistical analysis over multiple samples
Description: model consists of individual genotype likelihoods, transmission probabilities, and priors on the probability of observing a polymorphism or a de novo mutation at any given site in the genome
qsnp
Validated vs: GATK, strelka
Algorithm: heuristic; minimum of 3 reads, compare to in house database of variants
Notes: somatic calls. fast, easy to run on a cluster
Description: Classification into germline and somatic calls follows a number of simple rules that were designed to accommodate for the expected low mutant allele ratio in low purity tumors
rvd
Notes: MatLab. Detects very low frequency alleles
Description: See here. We use a multi-reference, indexed experimental design to minimize experimental variance and characterize a position-specific error distribution. We employ a rigorous statistical model to estimate the position-specific error rate distribution for reference sequences and thus the probability of a true mutation at each position in the sample. The statistical model provides a rigorous framework for hypothesis testing and estimation that minimizes false positives in variant calling.
seurat
Notes: outputs SNV/indel/LOH/SV. somatic calls
Validated vs: varscan 2, strelka, somatic sniper
Description: calculates the joint posterior probability that a variant exists in the tumor sample and not in the normal sample. The resulting VCF file contains both SNVs and indels.
snptools
Algorithm: haplotype imputation, effective base depth, binomial mixture modeling
Notes: includes genotype liklihood estimation
Description: SNPTools is organized by functionality into four modules ... EBD calculation: It summarizes mapping and base quality information to improve computational performance and reduce storage space. SNP site discovery: The variance ratio statistic utilizes EBD information to provide high-quality SNP variant calls. ... GL estimation: BAM-specific parameter estimation allows this algorithm to overcome data heterogeneity due to platforms reference bias (from mapping or capture), and low-quality data. Genotype/haplotype imputation: A constrained Li-Stephens population haplotype sampling schema
vcmm
Notes: Detects SNV/indel/SV. pileups input.
Validated vs: gatk, samtools
Algorithm: multinomial bayesian from paper in notes & strand bias filter
Description: The SNV calls were distinguished by the ratio of the probabilities that the minor allele at a nucleotide site is an error Perror and a major allele Pallele as described previously
vip
Validated vs: dna sudoku, overlap log
Algorithm: overlapping pools
Description: A complete data analysis framework for overlapping pool designs, with novelties in all three major steps: variant pool and variant locus identification, variant allele frequency estimation and variant sample decoding. VIP is very flexible and can be combined with any pool design approaches and sequence mapping/alignment tools.
virmid
Compared by: Den9
Notes: measures purity. Exome only. somatic calls
Validated vs: jointSNVmix 2, strelka, varscan 2,
Algorithm: Estimate purity, bayesian inference with estimated joint genotype probability matrix as the prior distribution
Description: estimate α, the level of impurity, i.e. the admixture of stromal cells in the cancer sample. A maximum likelihood estimation method is used. Next, the most probable genotype is estimated in the somatic variant caller step, using a Bayesian algorithm.
varscan2
Compared by: Den9, Wash7, Bcb8, Aus4, Van6, Gor4, Swi9
Notes: Calls SNV/indel/CNV. Mpileups input. somatic calls.
Validated vs: Somatic Sniper
Used by: GDC, SomaticSeq, bcbio, rave, bioconda
Algorithm: fisher's exact test, CBS alg for cnv, filters snps by heuristic criteria
Description: heuristic pairwise comparisons of base calls and normalized sequence depths at each position. Variants are classified into germline, somatic, LOH and unknown
jointsnvmix
Validated vs: compared to identical but non-joint and joint with fisher's exact test
Used in: SomaticSeq, rave
Algorithm: bayesian joint genotype of the samples
Notes: somatic calls. ranks the mutations. true joint calling
Description: probabilistic graphical model to analyse sequence data from tumour/normal pairs. allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
lofreq
Validated vs: snver, breseq, samtools, some custom methods
Used in: somaticseq, biocondor
Algorithm: poisson binomial with bernoulli trials
Notes: somatic calls. uses bonferoni correction, tries for deep sequence <0.05 low allele frequency
Description: models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population
strelka
Compared by: Den9, Wash7, Barc2, Aus4, Van6, Gor4
Validated vs: varscan, samtools
Used in: SomaticSeq
Algorithm: bayesian joint probability of normal and somatic, indel realign
Notes: Outputs SNV/indel. somatic calls. works in presence of impurities, joint calling
Description: Bayesian approach wherein the tumor and normal allele frequencies are treated as continuous values. Search for candidate indels, realign, produce somatic variant probabilities. Strelka uses allele frequencies rather than diploid genotypes
atlas2
Validated vs: gatk ug, dindel, samtools mpileup
Algorithm: logistic regression model includes reference/variant reads ratio for calling, and variety of features for filtering
Notes: outputs SNV/indel. exome only? part of Genboree. fast. may be a windows app. works for SOLiD, Illumina, and Roche 454
Description: Est. error as 11bp window rolling average. Filter variants on uni-directional reads.
conan-snv
Notes: CNV-informed SNV. binomial mixture model, one per copy
Description: integrates information about copy number state of different genomic segments into the inference of single nucleotide variants. CoNAn-SNV requires as input a pileup file (either Maq or Samtools format) and model parameters, as well as a file demarcating segmentation boundaries of copy number amplifications
cortex
Notes: Detects SNV/SV.
Algorithm: reference-free de bruijn graph
Description: extend classical de Bruijn graphs37,38 by colouring the nodes and edges in the graph by the samples in which they are observed. This approach accommodates information from multiple samples, including one or more reference sequences and known variants.
deepsnv
Compared by: Swi9
Notes: SNV/indel. for targeted sequencing. population-based calls
Validated vs: varscan 2, crisp, vipr
Used by: Biocondor
Algorithm: beta-binomial model, error model from population data
Notes: uses population data, fast due to C implementation
Description: Model for error distribution is based on the observation that sequencing artifacts are recurrent on specific loci. In a large cohort this allows to define a background error distribution on each locus, above which true variants can be called.
gems
Notes: pileups input.
Validated vs: varscan2, snvmix2, freebayes, maq, samtools, gatk, atlas, soapsnp
Algorithm: bayesian multinomial, base- and alignment-quality priors, Dixon's Q-test
Notes: max of 2 alleles
Description: statistical model accounts for enzymatic substitution sequencing errors, addresses the multiple testing problem
impute2
Used by: biocondor
Algorithm: haplotype imputation
Description: statistically estimate the haplotypes underlying the GWAS genotypes (“pre-phasing”), then impute into these haplotypes as if they were correct
somatic_sniper
Compared by: Den9, Wash7, Aus4, Van6, Gor4, Swi9
Notes: somatic calls.
Validated vs: snvmix 2
Used by: GDC, SomaticSeq, rave, bioconda
Algorithm: basic joint probability bayesian genotyping
Description: is like Mutect based on a Bayesian posterior possibility. Somatic Sniper reports a somatic score (SSC), a Phred-scaled probability between 0 and 255, that the tumor and normal genotypes are different
bambino
Notes: outputs SNV/indel. somatic calls
Algorithm: basic. some filters
Description: Bambino's variant detector and assembly viewer are capable of pooling and analyzing data from multiple BAM files simultaneously.
freebayes
Notes: outputs SNV/indel/MNPs
Used by: bcbio, biocondor
Algorithm: haplotype-aware bayesian inference with multiallelic loci and non-uniform copy number across the samples
Description: generalize the Bayesian statistical method described by Marth to allow multiallelic loci and non-uniform copy number across the samples under consideration.
mutationseq
Notes: Somatic calls. Available at command line for JointSNVMix
Validated vs: samtools, gatk ug
Algorithm: classic machine learning for somatic calling
Description: Comparison of four classic machine learning algorithms toward SNV calling
snver
Validated vs: CRISP, samtools, gatk
Algorithm: model minor alleles from pooled cancer/normal samples, using binomial dist
Notes: somatic calls. fast, early paired model, reports p-val
Description: statistical tool SNVer for calling SNPs in analysis of pooled or individual NGS data. Different from the previous models employed by CRISP, it analyzes common and rare variants in one integrated model, which considers and models all relevant factors including variant distribution and sequencing errors simultaneously.
syzygy
Notes: mpileup input. population-based calls.
Algorithm: multinomial bayesian with filters
Description: empirical modeling of the sequencing error processes and filters to remove sites with strand inconsistency or clusters of variants suggestive of read misalignment
vipr
Notes: mpileup input. outputs SNV/deletions. population-based calls. uses pooled samples
Validated vs: crisp, poisson, varscan
Description: vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution.
crisp
Notes: outputs SNV/indel
Algorithm: Fisher's exact test
Description: identify rare variants by comparing the distribution of allele counts across multiple DNA pools using contingency tables. To detect common variants, we utilize individual base-quality values to compute the probability of observing multiple non-reference base calls due to sequencing errors alone. Additionally, we incorporate information about the distribution of reads on the forward and reverse strands and the size of the pools to filter out false variants.
indelocator
Compared by: Den9
Used by: SomaticSeq
Notes: not published