4. Genotyping genomic regions - abyzovlab/CNVpytor GitHub Wiki

Using -genotype option followed by bin_sizes you can enter region and genotype calculation for each bin size will be performed:

> cnvpytor -root file.pytor -genotype 10000 100000
12:11396601-11436500
12:11396601-11436500    1.933261    1.937531
22:20999401-21300400
22:20999401-21300400    1.949186    1.957068

Genotyping with additional information:

> cnvpytor -root file.pytor -genotype 10000 -a [-rd_use_mask] [-nomask]
12:11396601-11436500
12:11396601-11436500    2.0152  1.629621e+04    9.670589e+08    0.0000  0.0000  4156900 1.0000  50      4       0.0000  1.000000e+00

Output columns are:

region,
cnv level -- mean RD normalized to mean autosomal RD level,
e_val_1 -- p value calculated using t-test statistics between RD statistics in the region and global,
e_val_2 -- p value from the probability of RD values within the region to be in the tails of a gaussian distribution of binned RD,
q0 – fraction of reads mapped with q0 quality within call region,
pN – fraction of reference genome gaps (Ns) within call region,
dG -- distance from closest large (>100bp) gap in reference genome,
proportion of bins used in RD calculation (with option -rd_use_mask some bins can be filtered out),
Number of homozygous variants within region,
Number of heterozygous variants,
BAF level (difference from 0.5) for HETs estimated using maximum likelihood method,
p-value based on BAF signal.

Option -rd_use_mask turns on P filtering (1000 Genome Project strict mask) for RD signal.

Option -nomak turns off P filtering of SNPs (1000 Genome Project strict mask) for BAF signal.

Example:

Genotype all called CNVs:

> awk '{ print $2 }' calls.10000.tsv | cnvpytor -root file.pytor -genotype 10000 100000