Outil 1 : vt - Dioufamad/SNPs_Calling GitHub Wiki

pour : classer et sortir les variants

  • Views a VCF or VCF.GZ or BCF file. :
  #views mills.bcf and outputs to standard out
  vt view -h mills.bcf
usage : vt view [options] <in.vcf>
 options : -o  output VCF/VCF.GZ/BCF file [-]
           -f  filter expression []
           -w  local sorting window size [0]
           -s  print site information only without genotypes [false]
           -H  print header only, this option is honored only for STDOUT [false]
           -h  omit header, this option is honored only for STDOUT [false]
           -p  print options and summary []
           -r  right window size for overlap []
           -l  left window size for overlap []
           -c  compression level 0-9, 0 and -1 denotes uncompressed with the former being wrapped in bgzf. [6]. try 4 because ideal in time and size
           -t  bed file for variant selection via streaming []
           -I  file containing list of intervals []
           -i  intervals []
           -?  displays help
  • Indexes a VCF.GZ or BCF file.
  #indexes mills.bcf
  vt index mills.bcf 
  #indexes mills.vcf.gz
  vt index mills.vcf.gz
  • #sorts mills.bcf with no assumption
  vt sort mills.bcf -o out.bcf 

- usage : vt sort [options] <in.vcf>
 options : -m  sorting modes. [full]
               local : locally sort within a 1000bp window.  Window size may be set by -w.
               chrom : sort chromosomes based on order of contigs in header.
                       input must be indexed.
               full  : full sort with no assumptions.
           -o  output VCF/VCF.GZ/BCF file. [-]
           -w  local sorting window size, set by default to 1000 under local mode. [0]
           -p  print options and summary. []
           -?  displays help
  • Normalise :
#normalize variants and write out to dbsnp.normalized.vcf
  vt normalize dbsnp.vcf -r seq.fa -o dbsnp.normalized.vcf
  #variants that are normalized will be annotated with an OLD_VARIANT info tag.
  #CHROM  POS      ID   REF           ALT  QUAL  FILTER  INFO
  19	  29238772 .	C             G    .     PASS	 VT=SNP;OLD_VARIANT=19:29238771:TC/TG
  20	  60674709 .	GCCCAGCCCCAC  G    .     PASS	 VT=INDEL;OLD_VARIANT=20:60674718:CACCCCAGCCCC/C
#this shows a sample output with the normalization operations that were used 
  #categorized into 5 categories each for biallelic and multiallelic variants. 

  stats: biallelic
         no. left trimmed                      : 156908
         no. right trimmed                     : 323
         no. left and right trimmed            : 33
         no. right trimmed and left aligned    : 7
         no. left aligned                      : 12360 

      total no. biallelic normalized           : 169631 
 

      multiallelic
         no. left trimmed                      : 627189
         no. right trimmed                     : 2509
         no. left and right trimmed            : 1498
         no. right trimmed and left aligned    : 212
         no. left aligned                      : 1783 

      total no. multiallelic normalized        : 633191 

      total no. variants normalized            : 802822
      total no. variants observed              : 88052639
  usage : vt normalize [options] <in.vcf>
 options : -o  output VCF file [-]
           -d  debug [false]
           -q  do not print options and summary [false]
           -m  warns but does not exit when REF is inconsistent
               with masked reference sequence for non SNPs.
               This overides the -n option [false]
           -n  warns but does not exit when REF is inconsistent
               with reference sequence for non SNPs [false]
           -w  window size for local sorting of variants [10000]
           -I  file containing list of intervals []
           -i  intervals []
           -r  reference sequence fasta file []
           -?  displays help
⚠️ **GitHub.com Fallback** ⚠️