4. Genotyping genomic regions - abyzovlab/CNVpytor GitHub Wiki

Using -genotype option followed by bin_sizes you can enter region and genotype calculation for each bin size will be performed:

> cnvpytor -root file.pytor -genotype 10000 100000
12:11396601-11436500
12:11396601-11436500    1.933261    1.937531
22:20999401-21300400
22:20999401-21300400    1.949186    1.957068

Genotyping with additional information:

> cnvpytor -root file.pytor -genotype 10000 -a [-rd_use_mask] [-nomask]
12:11396601-11436500
12:11396601-11436500    2.0152  1.629621e+04    9.670589e+08    0.0000  0.0000  4156900 1.0000  50      4       0.0000  1.000000e+00

Output columns are:

  1. region,
  2. cnv level -- mean RD normalized to mean autosomal RD level,
  3. e_val_1 -- p value calculated using t-test statistics between RD statistics in the region and global,
  4. e_val_2 -- p value from the probability of RD values within the region to be in the tails of a gaussian distribution of binned RD,
  5. q0 – fraction of reads mapped with q0 quality within call region,
  6. pN – fraction of reference genome gaps (Ns) within call region,
  7. dG -- distance from closest large (>100bp) gap in reference genome,
  8. proportion of bins used in RD calculation (with option -rd_use_mask some bins can be filtered out),
  9. Number of homozygous variants within region,
  10. Number of heterozygous variants,
  11. BAF level (difference from 0.5) for HETs estimated using maximum likelihood method,
  12. p-value based on BAF signal.

Option -rd_use_mask turns on P filtering (1000 Genome Project strict mask) for RD signal.

Option -nomak turns off P filtering of SNPs (1000 Genome Project strict mask) for BAF signal.

Example:

Genotype all called CNVs:

> awk '{ print $2 }' calls.10000.tsv | cnvpytor -root file.pytor -genotype 10000 100000