SVIM - core-unit-bioinformatics/knowledge-base GitHub Wiki

author date tags
SW 2023-01-27 software, svim, control, parameters

SVIM

Structural variant identification using raw long reads

Citable sources

  • 2019 initial release:
    David Heller, Martin Vingron, SVIM: structural variant identification using mapped long reads, 
    Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2907-2915
    

Observation

The number of SV calls with SVIM is much higher for all categories compared to other SV caller.

Reason for this behavior

As stated in the Wiki of SVIM under point "5. Analysis and filtering" (see https://github.com/eldariont/svim/wiki#5-analysis-and-filtering):

 Unlike many other SV callers, SVIM does not filter its output but writes out all SV calls and their 
 respective scores. This means that even low-scoring calls supported by only a single read are contained in
 the output. Actually, often the majority of calls output by SVIM are calls supported by one or two reads.
 These calls with very low score are almost always caused by sequencing or alignment errors. It is therefore 
 highly recommended to filter the variant calls that SVIM produces based on these scores.

How do I filter the SV calls after running SVIM?

David Heller (one of the developers of SVIM) suggested in a personal communication with Jana Ebler to use a coverage based Quality score cutoff:

"In my experience, choosing a cutoff of 1/4 of the average coverage yield a good precision/recall trade-off:
 e.g. a cutoff of 3 for 10x, 5 for 20x, 7 for 30x, ...". 
 The Wiki recommends: "For high-coverage datasets (>40x), we would recommend a threshold of 10-15."

This is just an estimate and has to be compared to other SV caller!

Filtering the variant calls produced by SVIM is done by removing all calls with a score below the desired threshold, e.g. only including calls with score >= 10:

bcftools view -i 'QUAL >= 10' variants.vcf'   

The Wiki also suggests to use use a recent study of the HGSVC as a reference (see Ebert et al., 2021, https://www.science.org/doi/10.1126/science.abf7117):

A good approach could also be to select a cutoff that returns the expected or desired number of calls. 
A recent study by the Human Genome Structural Variation Consortium (HGSVC), for instance, detected on 
average 24,653 SVs per diploid human genome.

For more details go to https://github.com/eldariont/svim/wiki

Additional information regarding SVIM-asm

There is a related tool for haploid or diploid genome-genome alignments called SVIM-asm (https://github.com/eldariont/svim-asm), but this tool doesn't calculate a QUAL score and therefore you can't filter for trustworthy calls, which results in very high number of calls for all categories.

Sources:

⚠️ **GitHub.com Fallback** ⚠️