SV - deaconjs/ThousandVariantCallersRepo GitHub Wiki

Structural Variant Callers

Caller	Year	From	Study	Source	Algorithm
popins	2017	deCODE genetics	study	source	Assembly of unmapped reads across multiple samples, placing of contigs into reference genome, genotyping
svaba	2017	Broad Institute	study	source	Discordant reads, classify, follow split reads to pool reads, assemble contigs
valor	2017	Bilkent University	study	source	Long-range sequencing
novobreak	2017	University of Texas Maryland Anderson Cancer Center	study	source	de bruijn kmer hash to filter and assemble mutant contigs
vardict	2016	Astra Zeneca	study	source	split-read start, paired end, soft clipping, explicit alignment
gridss	2016	Walter and Eliza Hall Institute of Medical Research	study	source	de bruijn kmer hash to filter and assemble mutant contigs
sv-bay	2016	Curie Institute	study	source	discordant reads, classify, read coverage, bayesian caller
cosmos	2016	Advanced Industrial Science and Technology (AIST), Tokyo	study	source	discordant reads, classify, read coverage, score by binom dist on flanking read depths
svstat	2016	Baylor College of Medicine	study	source
svelter	2016	University of Michigan	study	source
skald	2016	University of Kansas Medical Center	study	N/A
devro	2016	Uppsala University	study	N/A
seq2c	2015	AstraZeneca	study	source
manta	2015	Illumina	study	source	split-read start, paired end, soft clipping, classification, assembly
metasv	2015	Bina/Roche	study	source	meta-caller
wham	2015	U of Utah	study	source	soft-clipping, cluster alternative alignments from bwa output
breakmer	2015	Broad, MacConaill	study	source	split-read start, paired end, soft clipping, assemble kmer collections of sv reads
indelminer	2015	Penn State U	study	source	split-read start, paired end, soft clipping, explicit alignment (indel)
breakseek	2015	Chinese Academy of Sciences	study	source	soft-clipping, paired-end classification, sophisticated probabilistic scoring for indels
raptr-sv	2015	USDA	study	source	split-read start, paired end, soft clipping
scanindel	2015	University of Minnesota	study	source	soft-clipping, binomial distribution for breakpoints, assembly/mapping, freeBayes
speedseq	2015	WashU St Louis, Hall	study	source
lumpy	2014	U Virginia	study	source	breakpoint probability map, read-pair, split read, read-depth
scalpel	2014	Cold Spring Harbor, Simons Center for Quantitative Biology	study	source	de bruijn graph, iterative k-mer adjustment
gindel	2014	U Conneticut	study	source	Support Vector Machine (SVM)
gustaf	2014	Freie U, Berlin	study	source	split-read start, align unmapped reads for breakpoints
smufin	2014	Barcelona Supercomputing Center	study	source	somatic reference free, quaternary sequence tree
socrates	2014	The Walter and Eliza Hall Institute of Medical Research	study	source	split-read start, merge clusters
ulysses	2014	Lab of Computational and Quantitative Biology, Paris	study	source
vivar	2014	Ghent U, Belgium	study	source
bellerophon	2013	Case Western	study	source	discordant reads, interchromosomal, soft-clipping breakpoints
sv-m	2013	Max Planck Institute	study	source
pesv-fisher	2013	Center for Genomic Regulation, Spain	study	source
isvp	2013	Tohoku University, Japan	study	source	meta-caller
meerkat	2013	Harvard, Park	study	source	discordant reads, classify, recognize specific complex events e.g. repair pathways
soapindel	2013	BGI Shenzhen	study	source	de bruijn graph, identifies breakpoints from discordant reads
softsearch	2013	Mayo Clinic, Kocher	study	source	soft-clipping, heuristic, number of soft-clipped reads
tigra	2013	WashU MD Anderson Cancer Center, Weinstock	study	source	de bruijn graph, requires input break points
delly	2012	EMBL	study	source	discordant reads, classify, adds long-range mate pairs, split reads to get breakpoint
cn.mops	2012	Johannes Kepler U, Australia	study	source
battenberg	2012	Welcome Trust Sanger Institute, UK	study	N/A
breakpointer	2012	Max Plank Institute, Haas	study	source
clever	2012	Life Sciences Group, Amsterdam	study	source	discordant reads, cluster on concordant pairs
forestsv	2012	University of California San Diego, Sebat lab	study	source
gasvpro	2012	Brown U	study	source
hugeseq	2012	Stanford University	study	source	plural caller with simple aggregation and voting
prism	2012	U Toronto,	study	source
splitread	2012	Howard Hughs Medical Institute	study	source	split-read start, hamming distance, de novo via read depth
svm2	2012	Univerisity of Milan	study	source
clipcrop	2011	U Tokyo	study	source
crest	2011	St Jude, Zhang	study	source	soft-clipping, binomial distribution, assembly/mapping
genomestrip	2011	Broad, McCarroll	study	source	discordant reads, reassemble by allele, read-depth, breakpoint database
ingap-sv	2011	Chinese Academy of Sci, Zhao	study	source	depth of coverage, paired end
hydra	2010	U Va	study	source	split-read start, paired end
age	2010	Yale, Gerstein	study	source
slope	2010	WashU St Louis, Pfiefer	study	source
svdetect	2010	Curie Institute	study	source	discordant reads, classify, sliding window clustering, mate-pair
svmerge	2010	Sanger Institute	study	source	meta-caller
pindel	2009	EMBL, Ning	study	source	split-read start
breakdancer	2009	WashU St Louis, Mardis	study	source	discordant reads, classify, maq-based
breakseq	2009,15	Yale, Gerstein	study	source	map reads to breakpoints
pemer	2009	Yale, Gerstein	study	source	split-read start, merge clusters

popins

Notes: Non-reference sequence insertions from short-reads, population-scale

Validated vs: MindTheGap, Pamir

Used on WGS data of > 15,000 Icelanders and included in Graph Genome Pipeline and Illumina Polaris

Algorithm: joint assembly of unmapped reads across samples

Description: Collects reads without good alignment to the reference genome, filters these reads for contamination, and assembles the remaining ones into contigs. Next it merges the contigs across samples, which improves the assembly of non-reference sequence insertions shared by several individuals. The merged contigs are anchored into the reference genome using paired-end information and exact breakpoints positions are determined using split alignment. Popins finishes by computing genotype likelihoods for all anchored contig ends in all samples.

svaba

Notes: low memory, fast

Algorithm: identifies clipped, discordant, and unmapped reads, split pairs, and reads with deletions or insertions in the CIGAR string. Discordant reads are re-aligned, filtered, and clustered. Split read partners are identified and reads are pooled. Then realign grouped reads into contigs with specialized SGA assembler, align these contigs to reference with bwa-mem. Re-align all constituent reads to the contig or to reference keeping reads that match contig better.

Description: perform local assembly to create consensus contigs from sequence reads with divergence from the reference, and to apply this procedure to every region of the genome. The contigs are then compared to the reference to annotate the variants. By uniting the different classes of variant-supporting reads into a single framework, we further expect that this assemblyfirst approach would be effective for variants of all sizes and require few parameters

valor

Notes: long range sequencing e.g. 10X Genomics linked-read sequencing, pooled clone sequencing

Description: (variation using long range information). Briefly, valor searches for both read pair and split clone sequence signatures using the mapping locations of long range sequencing reads, and requires split clones from different pools to cluster at the same putative inversion breakpoints. Ambiguity due to multiple possible pairings of split clones are resolved using an approximation algorithm for the maximal quasi clique problem.

novobreak

Description: novoBreak algorithm divides tumor reads into shorter sections that are k nucleotides long (k-mers). By hashing and filtering out k-mers that match the reference and normal genomes, the algorithm identifies k-mers that are unique to the tumor and indicate breakpoints. Essentially a de Bruijn graph. Reads containing the unique k-mers are assembled into contigs local to the breakpoints, which are then aligned to the reference genome in order to infer exact breakpoints and their associated structural variants.

Description: obtains genome-wide local assembly of breakpoints from clusters of reads sharing a set of k-mers uniquely present in a subject genome but not in the reference genome or any control data... constructs a hash table from the tumor reads, containing all the k-mers, their host reads and frequencies in the set. Next, it filters out k-mers representing reference alleles or sequencing errors, and retains those representing variants... classifies the k-mers into 1) germline k-mers, those present in both the tumor and the normal genome, and 2) somatic k-mers, those present in the tumor but not the normal genome. Then, novoBreak identifies clusters of read pairs spanning each somatic breakpoint, and assembles each cluster of reads into contigs. By comparing the resulting high-quality contigs with the reference, novoBreak identifies breakpoints and associated SVs. Finally, novoBreak quantifies the amount of the supporting evidences at each breakpoint and outputs a final report.

vardict

Notes: outputs SV/indel/SNV/LOH. somatic caller. Efficient with ultra deep seq. calls complex variants (same read, multiple vars). filters PCR artifacts. Estimates SV allele frequency.

Validated vs: GATK UG/HC, Freebayes, varscan, pindel, scalpel, manta, lumpy

Used by: SomaticSeq, RAVE, bcbio

Compared by: bcb8

Algorithm: consensus on realigned soft-clipped reads used as search query

Description: Calls SNV, MNV, InDels, complex and structural variants, performs local realignments on the fly. Performance scales linearly to sequencing depth. Performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings. Is able to detect PCR artifacts. Detects differences in somatic and loss of heterozygosity variants between paired samples.

gridss

Used by: bcbio

Description: performs alignment-constrained whole genome breakend assembly using a novel positional de Bruijn graph algorithm and a probabilistic structural variant caller that combines assembly, split read, and read pair evidence in a unified variant scoring model.

sv-bay

Validated vs: gasvpro, breakdancer, lumpy, delly

Notes: works with paired end or mate pair; analyzes tumor/normal pair concurrently

Algorithm: PEM & read coverage with bayesian testing for adjacency

Descripton: we combine both PEM signatures and information about changes in DOC in regions flanking each candidate rearrangement. Our method takes into account GC-content and mappability. The use of a Bayesian framework based on both PEM and DOC information allows us to significantly decrease the level of false positive predictions while retaining high sensitivity. Additionally, SV-Bay infers 15 different types of structural variant from the detected novel genomic adjacencies

cosmos

Validated vs: Breakdancer, GasVPro, Delly, Lumpy

Notes: somatic caller. compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. fast. mouse model for validation plus synthetic

Algorithm: discordant pair reads, classify, DOC binomial

Description: compares the mapping read status of paired-end short reads in a tumor sample with a normal sample in an asymmetric manner: groups of discordant read pairs, which are indicative of SVs, are generated from the tumor sample, following which the groups are filtered against individual discordant read pairs, instead of the group equivalents, in the normal sample to eliminate false positives. Next, we introduce the concept of strand-specific read depth, which allows prioritization of candidate SVs more efficiently than the conventional strand-independent read depth.

svstat

Description: we explored methods for quantifying support for nucleotide-resolved breakpoints of SVs without PE, SR, or DN. all reads are aligned to the reference genome... Recurrent alignment stop or start coordinates indicate candidate breakpoints... Candidate breakpoint regions are paired with each other to form a sequence “library” of candidate junctions. Stack reads are then aligned to the library of candidate SVs, and evidence for each candidate junction (C) is calculated based on 1) the number of bases in the tails aligned to the partner region, and 2) the quality scores of the alignments.

svelter

Notes: Handles complex rearrangements

Description: accurately resolve complex structural genomic rearrangements in whole genomes. Unlike previous “bottom up” strategies that search for deviant signals to infer structural changes, our “top down” approach works by virtually rearranging segments of the genomes in a randomized fashion and attempting to minimize such aberrations relative to the observed characteristics of the sequence data. In this manner, SVelter is able to interrogate many different types of rearrangements, including multi-deletion and duplication-inversion-deletion events as well as distinct overlapping variants on homologous chromosomes

skald

Notes: detects >50nt deletion structural variants

Description: combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules.

top

devro

Notes: population: paired end/depth of coverage

seq2c

Notes: From AstraZeneca, inactive

manta

Notes: produces SV and indel calls. runs on trios. Parallelized for clusters. handles degraded FFPE samples, uses pedigree-consistency and cosmic for validation. faster than delly

Validated vs: pindel, delly

Used by: bcbio, bioconda

Algorithm: breakend graph/assembly

Description: provides scoring models for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs, with additional applications under development for RNA-Seq, de novo variants, and unmatched tumors. less than a tenth of the time that comparable methods require

metasv

Validated vs: pindel, BreakSeq2, LUMPY, BreakDancer, Delly, CNVNator, MindTheGap

Used by: bioconda, bcbio

Algorithm: merge then assemble: pindel, BreakSeq2, LUMPY, BreakDancer, Delly, CNVNator, MindTheGap

Notes: consensus caller

Description: merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes.

wham

Notes: de novo calls. requires bwa. does association testing

Validated vs: lumpy, delly, softsearch

Used by: bcbio, biocondor

Compared by: Bcb8

Algorithm: mate-pair & split read mapping, soft-clipping, alternative alignment, consensus sequence based evidence

Description: pinpoint SVs in pooled and genotypic data associated with phenotypic variation. uses split-read, mate-pair, and alternative alignments to find the other SV breakpoint. Positions in the pileup where three or more primary reads share the same breakpoint are interrogated as a putative SV. Use SA and XA cigar tags as alternative alignment locations, cluster those. SW align clipped consensus to alternative locations. intra-chromosomal require min 2 reads.

breakmer

Notes: For targeted sequencing. somatic calls.

Validated vs: crest, meerkat, breakdancer, pindel

Algorithm: soft clip, identify kmers in reads but not in reference

Description: uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome

indelminer

Notes: outputs indels only. simple de novo, somatic calls. validatin against synthetic variants introduced to chr22 and the na18507 data set. recommended to align with gatk indelRealigner

Validated vs: samtools, pindel, prism

Algorithm: split-read, paired-end, soft-clipped. align unmapped reads at both ends to look for indels

Description: uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach

breakseek

Notes: describes parameters for competitors, works reasonable well for all size indels, estimates level of heterozygosity

Validated vs: pindel, lumpy, crest, soapindel, breakdancer, prism, delly

Algorithm: soft-clipping/breakread break points, paired-end span to validate indel, sophisticated probabilistic scoring model

Description: unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations

raptr-sv

Notes: sensitivity for tandem duplications

Algorithm: discordant read-pair, split read, soft-clip, filter

Description: combining their predictions to generate highly confident SV calls, which can be filtered at runtime for improved accuracy.

scanindel

Algorithm: Identify quality soft-clipped reads, cluster and use binomial distribution to identify breakpoints, remap these breakpoint contigs and unmapped reads to reference with BLAT, reassemble breakpoint contigs de novo with inchworm/Trinity and map with BLAT, both remapped and reassembled paths produce BAM files freeBayes calls indels on.

Description: integrates multiple signals from all three sources (gapped alignment, split reads and de novo assembly) allows for more sensitive indel discovery than methods examining merely one or two signals. Our framework scans the initial mapping file from a gapped NGS aligner and refines the alignment of the soft-clipped reads meeting tiered criteria. Next, de novo assembly is performed for the selected soft-clipped reads and unmapped reads. Subsequent to the re-alignment and assembly, we have applied a Bayesian haplotype-based variant caller to detect indels.

speedseq

Notes: assumes diploid. uses pedigree+ to call validation variants. discusses parameters. reports confidence score.

Algorithm: meta. freebayes, lumpy, cnvnator, and custom caller svtyper.

top

lumpy

Notes: sensitive to low MAF. not great for small dels

Validated vs: gasvpro, delly, pindel

Used by: bcbio, metasv, biocondor

Compared by: bcb8, bcb3

Algorithm: merge read-pair, split read, read-depth in a breakpoint probability map. classify and cluster

Description: LUMPY integrates disparate signals by converting them to a common format in which the two predicted breakpoint intervals in the reference genome are represented as paired probability distributions.

scalpel

Notes: indel only. exome-capture data. de novo calls. slow, not for wgs. does indel normalization

Validated vs: gatk hc, SOAPindel

Used by: somaticseq, bcbio, biocondor

Compared by: Bcb8

Algorithm: de bruijn graph traversal, local assembly with iterative k-mer k value reassessment to eliminate repeats.

Description: localized micro-assembly of specific regions of interest with the goal of detecting mutations with high accuracy and increased power. It is based on the de Bruijn graph assembly paradigm and implements an on-the-fly repeat composition analysis coupled with a self-tuning k-mer strategy

gindel

Notes: indels > 50bp only. efficient

Validated vs: pindel, cleversv

Algorithm: SVM on 7 features: discordant pair, split-read, read depth, concordant encompassing pair, single-end-mapped pair, partially mapped reads, fully-mapped spanning reads,

Description: An approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. It performs well for insertion genotyping on both simulated and real data. GINDEL can not only call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches.

gustaf

Notes: small validation set. claims best for small SVs 30-100bp. small validation set. might work with FFPE

Validated vs: delly, pindel

Used by: biocondor

Algorithm: local alignment between unmapped reads shows breakpoints

Description: based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution.

smufin

Notes: produces SV and SNV calls. paired exome fastq input. somatic calls. parallelized

Validated vs: mutect, breakdancer, pindel, delly, crest

Algorithm: directly compares reads "quaternary sequence tree"

Description: directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution.

socrates

Notes: somatic calls. needs parameterized. fast. split-read only algorithms are better for short read FFPE data. "On real tumour data without additional information, we find it impractical to run at its most sensitive settings, but it is easily tuned."

Algorithm: re-aligns and clusters soft-clipped reads

Description: uses split reads to find breakpoints. It is optimized to be fast and extremely sensitive.

ulysses

Notes: mate-pair only

Description: assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise.

vivar

Notes: needs a reference set for sequencing error model

Description: facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data

bellerophon

Notes: detects interchromosomal translocations. really basic caller

Validated vs: GASV, Breakdancer, SVDetect, and CREST

Algorithm: discordant reads, soft-clipped

Description: uses discordant read pairs and "soft-clipped" reads to predict the location of the precise breakpoints. for each chimeric breakpoint, attempts to classify it as a participant in an unbalanced translocation, balanced translocation, or interchromosomal insertion.

sv-m

top

pesv-fisher

isvp

Notes: meta-caller. deletions only

Validated vs: breakdancer, delly, pindell, haplotypecaller

Algorithm: limit each method's calls to their optimal size ranges

meerkat

Notes: somatic calls. specific complex rearrangement events like dna repair pathways.

Algorithm: discordant read pair clustering with refinement

Description: considers local clusters of discordant read pairs to recognize specific complex events. uses split, clipped, and multiple-aligned reads

soapindel

Notes: calls indels. similar sensitivity and specificity for small indels, higher sensitivity for large indels. might be slow. should call SNPs too. assigns confidence q-scores. weird validation looking at hg19 vs venter genome and chimpanzee vs hg19

Validated vs: dindel, pindel, gatk

Algorithm: identifies breakpoints from discordant reads, multi-path de bruijn graph assembly

Description: assign all unmapped reads with a mapped partner to their expected genomic positions and then perform extensive de novo assembly on the regions with many unmapped reads to resolve homozygous, heterozygous, and complex indels by exhaustive traversal of the de Bruijn graph

softsearch

Notes: slow but high TP rate. works at low depth but needs parameters adjusted. Levenstein Distance "confidence scores"

Validated vs: breakdancer, delly, crest, svseq

Algorithm: soft-clipping heuristic, number of soft-clipped reads per position

Description: Assuming soft clipping delineates the exact breakpoint position and direction, DRPs overlapping such soft-clipped areas should already contain the information about the type and size of SV, obviating the need for secondary alignments.

tigra

Notes: calls breakpoints. population calls locate common alleles. de novo? uses population data, low FDR

Algorithm: iterative breakpoint collection, de bruijn graph, assembles the breakpoints

delly

Notes: calls SV and CNV. high sensitivity and specificity, lower sensitivity to small deletions

Validated vs: pindel, breakdancer, gasv, hydra

Used by: bcbio, metasv, biocondor

Compared by: bcb8, bcb3

Algorithm: integrates short insert paired-ends and long-range mate-pairs to identify discordant pairs, then uses split-read alignments to identify breakpoints

Description: integrates short insert paired-ends, long-range matepairs and split-read alignments to accurately delineate genomic rearrangements

cn.mops

Notes: calls CNV (move to CNV). population-based calls.

Used by: biocondor

Compared by: bcb3

Description: decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions

cpgbattenberg

breakpointer

Notes: single-end read breakpoint locator

Validated vs: pindel

Description: By taking advantage of local non-uniform read distribution and misalignments created by SVs, Breakpointer scans the alignment of single-end reads to identify regions containing potential breakpoints.

top

clever

Notes: better at 20-100bp size range

Validated vs: gasv, variationhunter, breakdancer, hydra

Used by: biocondor

Algorithm: clustering on concordant pairs

Description: enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions.

forestsv

Algorithm: random forests

gasvpro

Notes: validates vs HuRef, NA18705, and NA12878, has ROC curves. models uncertainty in call/reference overlap for truth calls to be more precise

Validated vs: hydra, breakdancer, CNVer

Algorithm: joint P on paired read and read depth

Description: Combines read depth information along with discordant paired-read mappings into a single probabilistic model two common signals of structural variation.

hugeseq

Notes: SNP/SV/CNV caller. "whole pipeline" includes PCR artifact removal, GATK realignment, variant calling, functional annotation with Annovar.

Algorithm: SNPs with UnifiedGenotyper and SAMtools, indels with Dindel, CNV/SV with BreakDancer, Pindel, CNVnator, BreakSeq. basic vote 2+ gives "high confidence"

prism

Description: uses a split-alignment approach informed by the mapping of paired-end reads, hence enabling breakpoint identification of multiple SV types, including arbitrary-sized inversions, deletions and tandem duplications

splitread

Notes: exome only. de novo calls. compares read depth between parents and child to identify de novo mutations

Algorithm: discordant pairs clustering, map with mrsFast and hamming distance, call anomolous mappings, split unmapped reads, search.

Description: searches for clusters of mate pairs where one end maps to the reference genome but the other end does not because it traverses a breakpoint creating a mapping inconsistency with respect to the reference sequence

clipcrop

Notes: not for somatic. doesn't recognize useful mutation types according to socrates authors

Validated vs: breakdancer, cnvnator, pindel

Description: A soft-clipped sequence is an unmatched fragment in a partially mapped read

crest

Notes: somatic calls. made for somatic comparisons, lower performance for small deletions

Algorithm: single read soft clipping, classification

Description: uses the soft-clipping reads to directly map the breakpoints of structural variations

genomestrip

Notes: calls SV/CNV. population-based calls. validated by 1kGP, "most sensitive and accurate". less sensitive to small SVs. 2.0 adds CNV detection

Validated vs: spanner, pindel, breakdancer, pemer, cnvnator

Algorithm: discordant read pair clustering, reassemble breakpoint-spanning reads by allele, read-depth for copy number estimate, align unmapped reads to breakpoint database

Description: designed to find shared variation using data from multiple individuals. Genome STRiP looks both across and within a set of sequenced genomes to detect variation.

ingap-sv

Notes: good validation against 12878, differentiates homo- and hetero-zygous variants

Validated vs: Breakdancer, variationhunter, spanner, PEMer, cortex, pindel

Algorithm: paired-end mapping & depth of coverage

hydra

Notes: sanger split-read & illumina paired-end input.

Validated vs: None.

Algorithm: split-read + paired end

top

age

Notes: improved alignment algorithm, but does not call variants. narrow scope validation, proof of theory. has been modified for metasv inclusion

Used by: biocondor

Algorithm: read-depth

Description: AGE for Alignment with Gap Excision, finds the optimal solution by simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a ‘large-gap jump’ between the local end alignments to maximize the total alignment score. We also describe extensions allowing the application of AGE to tandem duplications, inversions and complex events involving two large gaps.

slope

Notes: basic simulated data

Validated vs: pindel, breakdancer

Description: detect sequence breakpoints from only one side of a split read, and therefore does not rely on the insert size for detection.

svdetect

Notes: paired-end/mate pairs input.

Validated vs: GasV

Algorithm: discordant read pairs, adds mate-pairs, sliding window for clustering

Description: anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions– deletions, inversions, duplications and balanced or unbalanced interchromosomal translocations.

svmerge

Algorithm: meta-caller - BDMax, Pindel, SECluster, RetroSeq, RDXplorer

pindel

Notes: slow, high FP rate

Used by: metasv, biocondor

Algorithm: split-read clustering, pattern growth algorithm to search local space for unmapped (split) read

Description: detect breakpoints of large deletions (1bp-10kbp) and medium sized insertions (1-20bp) from paired-end short reads

breakdancer

Notes: somatic calls. confidence scores, use Q>80

Used by: metasv

Validated vs: MoDIL, VariationHunter

Algorithm: paired-end MAQ calls; classify, cluster, multi-nomial Poisson-based confidence score

Descriptions: predicts large and small 10-100bp indels, inversions and translocations

breakseq

Used by: metasv

Algorithm: map reads to known breakpoints from a database

Description: scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs

pemer

Validated vs: PEM

Algorithm: split-read clustering, merges clusters

top