Analysis Expanded Help - mattravenhall/SV-Pop GitHub Wiki
This is a full list of arguments, but some details (for example default values) may be out of date. For the most recent version, run SVPop -h
.
--inFile, -F
List of vcf files, one per-line, each containing a single sample and SV model.
Each vcf file (one sample and one SV model per vcf) should be version 4.2, and feature the following details, examples are available in the test set. Default DELLY output should conform to this, but other outputs may require conversion. Secondary SV files do not need to be vcf files of this sort (see dirConcordance
). Lines beginning with a double hash (##) will be considered as comments.
-
QUAL
: should feature either LowQual or PASS. Alternatively set--minimumQuality
to 0 to skip filtering by variant quality. -
INFO
: must begin withIMPRECISE/PRECISE
and includeEND
(indicating the end bp),SVTYPE
(as either DEL, DUP, INS, or INV),PE
,SR
,MAPQ
(number of supporting paired end reads, number of supporting split reads, average mapping quality, required if moreQuality is True),INSLEN
(length of insertion, required if the SV model is INS). -
FORMAT
: must includeGT
(genotype, required if pullPhasing is True. If 'GT' is absent, all calls will have 1.0 PercMissing), andDR
(high-quality reference pairs),DV
(high-quality variant pairs),RR
(reference-supporting reads), andRV
(variant-supporting reads) (all required if includeReads is True). Some input formats, such as those from LUMPY, may lack this genotyping information by default.
--model, -M
Variant model. (DUP, DEL, INS, INV)
--refFile, -R
Annotation reference file. (gtf, gff, csv, tsv) (if doAnnotation is True)
--subPops, -P
Tab-separated file with 'Samples' (file names without extensions) and columns for each relevant sub-population. (required for post-analysis visualisation)
Example:
Sample | Continent | Country |
---|---|---|
Sample1 | Africa | Kenya |
Sample2 | Africa | Sudan |
Sample3 | Asia | Thailand |
--refFormat
Bypass reference format detection. (gtf,gff,csv,tsv) (default: tsv)
--refType
Specify a feature type to annotate with. (gtf/gff only) (default: gene)
--refTypeID
Specify a feature ID to annotate with. (gtf/gff only) (default: gene)
--chr
Specify a subset of chromosomes, separated by commas. (default: All)
--outFile, -O
Output file name (shared for all outputs). (default: outFile)
--writeVariants
Whether to write out pre-windows variants file. (default: True)
--writeWindows
Whether to write out windows stats file. (default: True)
--doAnnotation
Whether to annotate the final output. (default: True)
--includeReads
Whether to include median read counts. (default: False)
--moreQuality
Whether to include PE count, SR count and median MAPQ. (default: True)
--suppressWarnings
Whether to silence warning messages (not errors). (default: True)
--filterGaps
Whether to remove variants overlapping known gaps. (default: False)
--gapsFile
Reference csv (without header file) containing chromosome regions for exclusion.
Example:
CHR01 | 100 | 300 |
CHR01 | 600 | 750 |
CHR02 | 320 | 340 |
--writeSamples
Whether to write out sample IDs with each variant. (default: True)
--mergeChr
Merge per-chromosome files into one file. (default: True)
--doFst
Whether to calculate Fst values for sub-populations. (default: True)
--windowSize
Window size. (default: 1000)
--windowStep
Window step. (default: 500)
--doFiltering
Apply filtering step prior to output. (default: True)
--minLength
Minimum length of structural variants. (default: 0)
--maxLength
Maximum length of structural variants. (default: 100000)
--minimumQuality
Minimum threshold for Quality column. (default: 0.9)
--minimumPrecision
Minimum threshold for Precision column. (default: None)
--maximumPercRef
Maximum threshold for perc homozygous reference. (default: 0.1)
--maximumPercHet
Maximum threshold for perc heterozygous. (default: 0.3)
--minimumPercAlt
Minimum threshold for perc homozygous alternative. (default: None)
--maximumPercMissing
Maximum threshold for perc homozygous missing. (default: 0.1)
--removeIntergenic
Whether to remove purely intergenic variants. (default: False)
--excludeDupPercHet
Whether to exclude dups from PercHet filter. (default: False)
--minimumPE
Minimum supporting paired reads per-sample per-variant. (default: 0)
--minimumSR
Minimum supporting split reads per-sample per-variant. (default: 0)
--minimumMAPQ
Minimum median mapping quality per-sample per-variant. (default: 0)
--filterConcordance
Whether to filter variants by concordance. (default: False)
--dirConcordance
Path to directory containing secondary variant files, each named .cnvs
. (default: ./)
Chromosome | Start | End | Type | Length (optional) | Source (optional) |
---|---|---|---|---|---|
CHR_01 | 201 | 432 | DEL | 231 | SV-Finder |
CHR_01 | 620 | 820 | DUP | 200 | FauxDUP |
CHR_02 | 8671 | 8720 | DEL | 49 | SV-Finder |
--overlapConcordance
Proportion of overlap required for verification. (default: 0.8)
--percConcordance
Percentage of concordance required per-variants. (default: 0)
--multithread
Whether to split analysis over multiple cores. (default: True)
--threads
Number of cores to utilise when multi-processing. (default: Depends on your machine)
--help, -h
Display this information.
--version, -v
Display full version information.
--CONVERT
Convert a variant output file into a window file.
--FILTER
Filter a variant output file by a range of factors.
--MERGE-CHR
Merge per-chromosome variants files into one file.
--MERGE-MODEL
Merge by-model variants files into one file.
--SUBSET
Create a subset of a given variant or window file.
--STATS
Produce summary statistics for a variant or window files.
--PREPROCESS
Process analysis output files for visualisation.