SVIM - core-unit-bioinformatics/knowledge-base GitHub Wiki
author | date | tags |
---|---|---|
SW | 2023-01-27 | software, svim, control, parameters |
Structural variant identification using raw long reads
-
2019 initial release:
David Heller, Martin Vingron, SVIM: structural variant identification using mapped long reads, Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2907-2915
The number of SV calls with SVIM is much higher for all categories compared to other SV caller.
As stated in the Wiki of SVIM under point "5. Analysis and filtering" (see https://github.com/eldariont/svim/wiki#5-analysis-and-filtering):
Unlike many other SV callers, SVIM does not filter its output but writes out all SV calls and their
respective scores. This means that even low-scoring calls supported by only a single read are contained in
the output. Actually, often the majority of calls output by SVIM are calls supported by one or two reads.
These calls with very low score are almost always caused by sequencing or alignment errors. It is therefore
highly recommended to filter the variant calls that SVIM produces based on these scores.
How do I filter the SV calls after running SVIM?
David Heller (one of the developers of SVIM) suggested in a personal communication with Jana Ebler to use a coverage based Quality score cutoff:
"In my experience, choosing a cutoff of 1/4 of the average coverage yield a good precision/recall trade-off:
e.g. a cutoff of 3 for 10x, 5 for 20x, 7 for 30x, ...".
The Wiki recommends: "For high-coverage datasets (>40x), we would recommend a threshold of 10-15."
This is just an estimate and has to be compared to other SV caller!
Filtering the variant calls produced by SVIM is done by removing all calls with a score below the desired threshold, e.g. only including calls with score >= 10:
bcftools view -i 'QUAL >= 10' variants.vcf'
The Wiki also suggests to use use a recent study of the HGSVC as a reference (see Ebert et al., 2021, https://www.science.org/doi/10.1126/science.abf7117):
A good approach could also be to select a cutoff that returns the expected or desired number of calls.
A recent study by the Human Genome Structural Variation Consortium (HGSVC), for instance, detected on
average 24,653 SVs per diploid human genome.
For more details go to https://github.com/eldariont/svim/wiki
There is a related tool for haploid or diploid genome-genome alignments called SVIM-asm (https://github.com/eldariont/svim-asm), but this tool doesn't calculate a QUAL score and therefore you can't filter for trustworthy calls, which results in very high number of calls for all categories.
Sources: