4. Genotyping genomic regions - abyzovlab/CNVpytor GitHub Wiki
Using -genotype option followed by bin_sizes you can enter region and genotype calculation for each bin size will be performed:
> cnvpytor -root file.pytor -genotype 10000 100000
12:11396601-11436500
12:11396601-11436500 1.933261 1.937531
22:20999401-21300400
22:20999401-21300400 1.949186 1.957068
Genotyping with additional information:
> cnvpytor -root file.pytor -genotype 10000 -a [-rd_use_mask] [-nomask]
12:11396601-11436500
12:11396601-11436500 2.0152 1.629621e+04 9.670589e+08 0.0000 0.0000 4156900 1.0000 50 4 0.0000 1.000000e+00
Output columns are:
- region,
- cnv level -- mean RD normalized to mean autosomal RD level,
- e_val_1 -- p value calculated using t-test statistics between RD statistics in the region and global,
- e_val_2 -- p value from the probability of RD values within the region to be in the tails of a gaussian distribution of binned RD,
- q0 – fraction of reads mapped with q0 quality within call region,
- pN – fraction of reference genome gaps (Ns) within call region,
- dG -- distance from closest large (>100bp) gap in reference genome,
- proportion of bins used in RD calculation (with option -rd_use_mask some bins can be filtered out),
- Number of homozygous variants within region,
- Number of heterozygous variants,
- BAF level (difference from 0.5) for HETs estimated using maximum likelihood method,
- p-value based on BAF signal.
Option -rd_use_mask turns on P filtering (1000 Genome Project strict mask) for RD signal.
Option -nomak turns off P filtering of SNPs (1000 Genome Project strict mask) for BAF signal.
Example:
Genotype all called CNVs:
> awk '{ print $2 }' calls.10000.tsv | cnvpytor -root file.pytor -genotype 10000 100000