3. Import SNP data - abyzovlab/CNVpytor GitHub Wiki

From variant file

To import variant data from VCF file use following command:

> cnvpytor -root file.pytor -snp file.vcf.gz [-sample sample_name] [-chrom name1 ...] [-ad AD_TAG] [-gt GT_TAG] [-noAD]

where:

  • file.pytor -- specifies cnvpytor file,
  • file.vcf -- specifies variant file name.
  • sample_name -- specifies VCF sample name,
  • name1 ... -- specifies chromosome name(s),
  • -ad AD_TAG -- specifies AD tag used in vcf file (default AD)
  • -gt GT_TAG -- specifies GT tag used in vcf file (default GT)
  • -noAD -- ref and alt read counts will not be readed (see next section)

Chromosome names must be specified the same way as they are described in the vcf header, e.g., chrX or X. One can specify multiple chromosomes separated by space. If no chromosome is specified, all chromosomes from the vcf file will be parsed.

If chromosome names in variant and alignment file are different in prefix chr (e.g. in "1" and "chr1") cnvpytor will detect it and match the names using first imported name for both signals.

Using SNP positions from variant file and counts from alignment file

In some cases it is useful to read positions of SNPs from vcf file and extract read counts from bam file. For example if we have two samples, normal tissue and cancer, normal can be used to call germline SNPs, while samtools mpileup procedure can be used to calculate read counts in cancer sample at the positions of SNPs. CNVpytor have implemented this procedure. After reading SNP positions (previous step) type:

> cnvpytor -root file.pytor -pileup file.bam [-T ref.fa.gz]

where

  • file.pytor -- specifies cnvpytor file,
  • file.bam -- specifies bam/sam/cram file,
  • -T ref.fa.gz -- specifies reference genome file (only for cram file without reference genome).

Calculating BAF histograms

To apply 1000 genomes strict mask filter:

> cnvpytor -root file.pytor -mask_snps

To calculate baf histograms for maf, baf and likelihood function for baf use:

> cnvpytor -root file.pytor -baf 10000 100000 [-nomask]