Individual level compound heterozygosity - hms-dbmi/RaMeDiES GitHub Wiki

Individual-level compound heterozygosity

github_comphet_ind

:cyclone: About

RamediesCH_IND is an algorithm for individual-level inference of monogenic causes of suspected recessive disorders. This model differs from the other two statistical models in the RaMeDiES suite (ramediesDN and ramediesCH) as follows:

  • Although the other models may theoretically infer digenic causes of disease, ramediesCH_IND explicitly assumes that only up to one gene with compound heterozygous variants may be responsible for a recessive disorder.
  • The other models gain power with increasing cohort sizes, whereas ramediesCH_IND loses power when applied to larger cohorts due to conservative multiple test correction.
  • RamediesCH_IND is more powerful in a single-individual case and may produce better ranked lists, even across cohorts of 100s of individuals.

:cyclone: Quick Run

python ramediesCH_IND.py --i=/full/path/to/github/repo/RaMeDiES/test/input --o=test

:warning: Processed input variant files must include the inheritance column with mom or dad specified per variant.

:warning: By default, all properly-formatted and processed files within the directory specified by --i will be considered. You MUST REMOVE processed input files from this directory that correspond to individuals from families with genetic evidence of consanguinity.

:exclamation: Expected output file for our provided input test files can be found in test/output/.

Running on our provided test data, on a single 2.60GHz core with 0.5GB of RAM, should take about 3 minutes, 15 seconds.

:cyclone: Expected Output File

Only one output file, {prefix}_comphet_individual_level.txt is produced, with a header containing the Bonferroni correction factor and one row for each gene containing a compound heterozygous variant pair in a patient with the following eight tab-delimited columns each:

  1. file_name: input filename where this compound heterozygous variant pair was observed
  2. ensembl_gene_id: Ensembl gene ID
  3. gene_name: HGNC gene name
  4. P_val: uncorrected P-value
  5. P_cond uncorrected P-value conditional on the observation of a compound heterozygous variant pair; these values are expected to be uniformly distributed under the null.
  6. y_stat: compound heterozygous variant mutational target of the least expected compound heterozygous variant observed in this individual
  7. poisson_lambda: expected count of compound heterozygous variant pairs in this individual
  8. variant_info: pipe-limited information for the paternally- then maternally-inherited variants making up this compound heterozygous variant pair (separated by an &):
    1. variant chromosome
    2. reference allele
    3. variant position
    4. alternate allele
    5. two-letter code specifying the variant type (CS = coding SNV, CI = coding indel, IS = intronic SNV, II = intronic indel)
    6. variant functionality score
    7. variant inheritance (first value in the &-delimited pair is P for paternal, and second value will be M for maternal)
    8. input variant file name

:cyclone: Parameters

Parameter Description
-h, --help Show help message and exit
--variant_annots <> Types of variants considered; C for coding, I for intronic. Default: CI
--i <> Input directory containing preprocessed variant files. Must end with a forward slash /.
--o <> Prefix for the output files. Default: CHIND_result
--no_qual_track Do not use the Roulette-derived quality control column for filtering. This flag should be used only if the input variant files have already been QCed and contain highly-confident variants.
--coding_score Variant functionality score type to assign to coding SNP variants. Options: [CADD, REVEL, AlphaMissense, PAI3D]. Default: CADD
--coding_snv_thr <> :exclamation: Minimal variant functionality score allowed for coding SNP variants. Default: 0.5 for CADD (non-Phred-scaled). Suggested alternative values are 0.2 for REVEL, 0.1 for AlphaMissense and 0.3 for PrimateAI3D.
--coding_indel_thr <> :exclamation: Minimal variant functionality score allowed for coding indel variants. Default: 0.5 for CADD (non-Phred-scaled), which is the only currently supported deleteriousness score for coding indels.
--SAI_thr <> :exclamation: Minimal SpliceAI score allowed for intronic variants. Default: 0.05
--MAF <> :exclamation: Maximal MAF (minor allele frequency) allowed for all variants. The "MAF" parameter must be specified. Default: -1 (no filter)
--missense_run Flag to including only coding SNVs with a missense impact; this option is required when using REVEL, AlphaMissense, or PrimateAI3D which only score missense SNVs.
--suppress_indels <> Flag to exclude indel variants

:exclamation: We highly recommend imposing variant functionality and MAF constraints, because the expected number of inherited, rare variants per gene (that could contribute to a compound heterozygous configuration) increases with more lenient variant functionality constraints. We filter by CADD and SpliceAI scores in our implementation.