Analysis Expanded Help - mattravenhall/SV-Pop GitHub Wiki

This is a full list of arguments, but some details (for example default values) may be out of date. For the most recent version, run SVPop -h.

Required Arguments

--inFile, -F List of vcf files, one per-line, each containing a single sample and SV model. Each vcf file (one sample and one SV model per vcf) should be version 4.2, and feature the following details, examples are available in the test set. Default DELLY output should conform to this, but other outputs may require conversion. Secondary SV files do not need to be vcf files of this sort (see dirConcordance). Lines beginning with a double hash (##) will be considered as comments.

QUAL: should feature either LowQual or PASS. Alternatively set --minimumQuality to 0 to skip filtering by variant quality.
INFO: must begin with IMPRECISE/PRECISE and include END (indicating the end bp), SVTYPE (as either DEL, DUP, INS, or INV), PE, SR, MAPQ (number of supporting paired end reads, number of supporting split reads, average mapping quality, required if moreQuality is True), INSLEN (length of insertion, required if the SV model is INS).
FORMAT: must include GT (genotype, required if pullPhasing is True. If 'GT' is absent, all calls will have 1.0 PercMissing), and DR (high-quality reference pairs), DV (high-quality variant pairs), RR (reference-supporting reads), and RV (variant-supporting reads) (all required if includeReads is True). Some input formats, such as those from LUMPY, may lack this genotyping information by default.

--model, -M Variant model. (DUP, DEL, INS, INV)

--refFile, -R Annotation reference file. (gtf, gff, csv, tsv) (if doAnnotation is True)

Additional Arguments

--subPops, -P Tab-separated file with 'Samples' (file names without extensions) and columns for each relevant sub-population. (required for post-analysis visualisation)

Example:

Sample	Continent	Country
Sample1	Africa	Kenya
Sample2	Africa	Sudan
Sample3	Asia	Thailand

--refFormat Bypass reference format detection. (gtf,gff,csv,tsv) (default: tsv)

--refType Specify a feature type to annotate with. (gtf/gff only) (default: gene)

--refTypeID Specify a feature ID to annotate with. (gtf/gff only) (default: gene)

--chr Specify a subset of chromosomes, separated by commas. (default: All)

Output Formatting

--outFile, -O Output file name (shared for all outputs). (default: outFile)

--writeVariants Whether to write out pre-windows variants file. (default: True)

--writeWindows Whether to write out windows stats file. (default: True)

--doAnnotation Whether to annotate the final output. (default: True)

--includeReads Whether to include median read counts. (default: False)

--moreQuality Whether to include PE count, SR count and median MAPQ. (default: True)

--suppressWarnings Whether to silence warning messages (not errors). (default: True)

--filterGaps Whether to remove variants overlapping known gaps. (default: False)

--gapsFile Reference csv (without header file) containing chromosome regions for exclusion.

Example:


CHR01	100	300
CHR01	600	750
CHR02	320	340

--writeSamples Whether to write out sample IDs with each variant. (default: True)

--mergeChr Merge per-chromosome files into one file. (default: True)

--doFst Whether to calculate Fst values for sub-populations. (default: True)

--windowSize Window size. (default: 1000)

--windowStep Window step. (default: 500)

Filtering Options

--doFiltering Apply filtering step prior to output. (default: True)

--minLength Minimum length of structural variants. (default: 0)

--maxLength Maximum length of structural variants. (default: 100000)

--minimumQuality Minimum threshold for Quality column. (default: 0.9)

--minimumPrecision Minimum threshold for Precision column. (default: None)

--maximumPercRef Maximum threshold for perc homozygous reference. (default: 0.1)

--maximumPercHet Maximum threshold for perc heterozygous. (default: 0.3)

--minimumPercAlt Minimum threshold for perc homozygous alternative. (default: None)

--maximumPercMissing Maximum threshold for perc homozygous missing. (default: 0.1)

--removeIntergenic Whether to remove purely intergenic variants. (default: False)

--excludeDupPercHet Whether to exclude dups from PercHet filter. (default: False)

--minimumPE Minimum supporting paired reads per-sample per-variant. (default: 0)

--minimumSR Minimum supporting split reads per-sample per-variant. (default: 0)

--minimumMAPQ Minimum median mapping quality per-sample per-variant. (default: 0)

Verification by Concordance

--filterConcordance Whether to filter variants by concordance. (default: False)

--dirConcordance Path to directory containing secondary variant files, each named .cnvs . (default: ./)

Chromosome	Start	End	Type	Length (optional)	Source (optional)
CHR_01	201	432	DEL	231	SV-Finder
CHR_01	620	820	DUP	200	FauxDUP
CHR_02	8671	8720	DEL	49	SV-Finder

--overlapConcordance Proportion of overlap required for verification. (default: 0.8)

--percConcordance Percentage of concordance required per-variants. (default: 0)

Multi-Processing

--multithread Whether to split analysis over multiple cores. (default: True)

--threads Number of cores to utilise when multi-processing. (default: Depends on your machine)

Misc. Options

--help, -h Display this information.

--version, -v Display full version information.

Alternative Pipelines

--CONVERT Convert a variant output file into a window file.

--FILTER Filter a variant output file by a range of factors.

--MERGE-CHR Merge per-chromosome variants files into one file.

--MERGE-MODEL Merge by-model variants files into one file.

--SUBSET Create a subset of a given variant or window file.

--STATS Produce summary statistics for a variant or window files.

--PREPROCESS Process analysis output files for visualisation.