Outil 1 : vt - Dioufamad/SNPs_Calling GitHub Wiki
pour : classer et sortir les variants
- Views a VCF or VCF.GZ or BCF file. :
#views mills.bcf and outputs to standard out
vt view -h mills.bcf
usage : vt view [options] <in.vcf>
options : -o output VCF/VCF.GZ/BCF file [-]
-f filter expression []
-w local sorting window size [0]
-s print site information only without genotypes [false]
-H print header only, this option is honored only for STDOUT [false]
-h omit header, this option is honored only for STDOUT [false]
-p print options and summary []
-r right window size for overlap []
-l left window size for overlap []
-c compression level 0-9, 0 and -1 denotes uncompressed with the former being wrapped in bgzf. [6]. try 4 because ideal in time and size
-t bed file for variant selection via streaming []
-I file containing list of intervals []
-i intervals []
-? displays help
- Indexes a VCF.GZ or BCF file.
#indexes mills.bcf
vt index mills.bcf
#indexes mills.vcf.gz
vt index mills.vcf.gz
- #sorts mills.bcf with no assumption
vt sort mills.bcf -o out.bcf
- usage : vt sort [options] <in.vcf>
options : -m sorting modes. [full]
local : locally sort within a 1000bp window. Window size may be set by -w.
chrom : sort chromosomes based on order of contigs in header.
input must be indexed.
full : full sort with no assumptions.
-o output VCF/VCF.GZ/BCF file. [-]
-w local sorting window size, set by default to 1000 under local mode. [0]
-p print options and summary. []
-? displays help
- Normalise :
#normalize variants and write out to dbsnp.normalized.vcf
vt normalize dbsnp.vcf -r seq.fa -o dbsnp.normalized.vcf
#variants that are normalized will be annotated with an OLD_VARIANT info tag.
#CHROM POS ID REF ALT QUAL FILTER INFO
19 29238772 . C G . PASS VT=SNP;OLD_VARIANT=19:29238771:TC/TG
20 60674709 . GCCCAGCCCCAC G . PASS VT=INDEL;OLD_VARIANT=20:60674718:CACCCCAGCCCC/C
#this shows a sample output with the normalization operations that were used
#categorized into 5 categories each for biallelic and multiallelic variants.
stats: biallelic
no. left trimmed : 156908
no. right trimmed : 323
no. left and right trimmed : 33
no. right trimmed and left aligned : 7
no. left aligned : 12360
total no. biallelic normalized : 169631
multiallelic
no. left trimmed : 627189
no. right trimmed : 2509
no. left and right trimmed : 1498
no. right trimmed and left aligned : 212
no. left aligned : 1783
total no. multiallelic normalized : 633191
total no. variants normalized : 802822
total no. variants observed : 88052639
usage : vt normalize [options] <in.vcf>
options : -o output VCF file [-]
-d debug [false]
-q do not print options and summary [false]
-m warns but does not exit when REF is inconsistent
with masked reference sequence for non SNPs.
This overides the -n option [false]
-n warns but does not exit when REF is inconsistent
with reference sequence for non SNPs [false]
-w window size for local sorting of variants [10000]
-I file containing list of intervals []
-i intervals []
-r reference sequence fasta file []
-? displays help
- comprendre une sortie vt peek exemple de sortie vt peek (sortie 13)