Individual level compound heterozygosity - hms-dbmi/RaMeDiES GitHub Wiki
Individual-level compound heterozygosity
:cyclone: About
RamediesCH_IND is an algorithm for individual-level inference of monogenic causes of suspected recessive disorders. This model differs from the other two statistical models in the RaMeDiES suite (ramediesDN and ramediesCH) as follows:
- Although the other models may theoretically infer digenic causes of disease, ramediesCH_IND explicitly assumes that only up to one gene with compound heterozygous variants may be responsible for a recessive disorder.
- The other models gain power with increasing cohort sizes, whereas ramediesCH_IND loses power when applied to larger cohorts due to conservative multiple test correction.
- RamediesCH_IND is more powerful in a single-individual case and may produce better ranked lists, even across cohorts of 100s of individuals.
:cyclone: Quick Run
python ramediesCH_IND.py --i=/full/path/to/github/repo/RaMeDiES/test/input --o=test
:warning: Processed input variant files must include the inheritance column with
mom
ordad
specified per variant.:warning: By default, all properly-formatted and processed files within the directory specified by
--i
will be considered. You MUST REMOVE processed input files from this directory that correspond to individuals from families with genetic evidence of consanguinity.:exclamation: Expected output file for our provided input test files can be found in
test/output/
.Running on our provided test data, on a single 2.60GHz core with 0.5GB of RAM, should take about 3 minutes, 15 seconds.
:cyclone: Expected Output File
Only one output file, {prefix}_comphet_individual_level.txt
is produced, with a header containing the Bonferroni correction factor and one row for each gene containing a compound heterozygous variant pair in a patient with the following eight tab-delimited columns each:
file_name
: input filename where this compound heterozygous variant pair was observedensembl_gene_id
: Ensembl gene IDgene_name
: HGNC gene nameP_val
: uncorrected P-valueP_cond
uncorrected P-value conditional on the observation of a compound heterozygous variant pair; these values are expected to be uniformly distributed under the null.y_stat
: compound heterozygous variant mutational target of the least expected compound heterozygous variant observed in this individualpoisson_lambda
: expected count of compound heterozygous variant pairs in this individualvariant_info
: pipe-limited information for the paternally- then maternally-inherited variants making up this compound heterozygous variant pair (separated by an&
):- variant chromosome
- reference allele
- variant position
- alternate allele
- two-letter code specifying the variant type (
CS
= coding SNV,CI
= coding indel,IS
= intronic SNV,II
= intronic indel) - variant functionality score
- variant inheritance (first value in the
&
-delimited pair isP
for paternal, and second value will beM
for maternal) - input variant file name
:cyclone: Parameters
Parameter | Description |
---|---|
-h , --help |
Show help message and exit |
--variant_annots <> |
Types of variants considered; C for coding, I for intronic. Default: CI |
--i <> |
Input directory containing preprocessed variant files. Must end with a forward slash / . |
--o <> |
Prefix for the output files. Default: CHIND_result |
--no_qual_track |
Do not use the Roulette-derived quality control column for filtering. This flag should be used only if the input variant files have already been QCed and contain highly-confident variants. |
--coding_score |
Variant functionality score type to assign to coding SNP variants. Options: [CADD, REVEL, AlphaMissense, PAI3D]. Default: CADD |
--coding_snv_thr <> |
:exclamation: Minimal variant functionality score allowed for coding SNP variants. Default: 0.5 for CADD (non-Phred-scaled). Suggested alternative values are 0.2 for REVEL, 0.1 for AlphaMissense and 0.3 for PrimateAI3D. |
--coding_indel_thr <> |
:exclamation: Minimal variant functionality score allowed for coding indel variants. Default: 0.5 for CADD (non-Phred-scaled), which is the only currently supported deleteriousness score for coding indels. |
--SAI_thr <> |
:exclamation: Minimal SpliceAI score allowed for intronic variants. Default: 0.05 |
--MAF <> |
:exclamation: Maximal MAF (minor allele frequency) allowed for all variants. The "MAF" parameter must be specified. Default: -1 (no filter) |
--missense_run |
Flag to including only coding SNVs with a missense impact; this option is required when using REVEL, AlphaMissense, or PrimateAI3D which only score missense SNVs. |
--suppress_indels <> |
Flag to exclude indel variants |
:exclamation: We highly recommend imposing variant functionality and MAF constraints, because the expected number of inherited, rare variants per gene (that could contribute to a compound heterozygous configuration) increases with more lenient variant functionality constraints. We filter by CADD and SpliceAI scores in our implementation.