Frequently Used Variant Pathogenicity and Constraint Scores - core-unit-bioinformatics/knowledge-base GitHub Wiki
This is an explanation of some pathogenicity/constraint scores frequently used to identify interesting variants
Scores available as VEP plugins (also within the Variant Interpretation Pipeline):
REVEL - (A variant-level missense pathogenicity score. Higher values indicate higher likelihood of pathogenicity).
- Plugin explanation: https://github.com/Ensembl/VEP_plugins/blob/release/115/REVEL.pm
- Data download: https://sites.google.com/site/revelgenomics/downloads
- Comments:
- needs the additional parameter
--assembly GRCh38to run - did not yet work for me as a VEP plugin. Instead I extracted the needed data from the file (see Data download) with a python script via matching of transcript ID, position, reference base and alternative base.
- needs the additional parameter
CADD - A variant-level deleteriousness score (PHRED-like scaled). Higher values indicate greater predicted deleteriousness.
- Plugin explanation: https://github.com/Ensembl/VEP_plugins/blob/release/115/CADD.pm
- Data download: https://cadd.gs.washington.edu/download
SpliceAI - A variant-level splice disruption probability score, representing the strongest predicted splice effect.
- Plugin explanation: https://github.com/Ensembl/VEP_plugins/blob/release/115/SpliceAI.pm
- Data download: https://basespace.illumina.com/s/otSPW8hnhaZR
LOEUF - A gene-level LoF intolerance score. Lower values indicate stronger intolerance to loss-of-function variants.
- Plugin explanation: https://github.com/Ensembl/VEP_plugins/blob/release/115/LOEUF.pm
- Data download: https://gnomad.broadinstitute.org/downloads#v2-constraint
- ALTERNATIVE (which I used): Download supplementary data 11 of this publication (https://www.nature.com/articles/s41586-020-2308-7) and match the entries to your data via transcript ID
Scores without VEP plugin:
MIS_Z - A gene-level constraint score. Higher values indicate stronger depletion of missense variants.
- Data download: https://www.nature.com/articles/s41586-020-2308-7, supplementary data 11
- Comment: The mentioned supp. data file contains MIS_Z scores as well as LOEUF scores, so these can be processed together instead of using the VEP plugin for LOEUF.
VIP integration
- These plugins can be added to the Variant Interpretation Pipeline by putting them in the "run.config" file, e.g. like this (see the line starting with
'--format vcf:
process {
errorStrategy = { task.exitStatus in (1..200) ? 'retry' : 'finish' }
maxRetries = 3
withName: "ENSEMBLVEP_VEP" {
cpus = 32
memory = 125.GB
time = 72.h
ext.args = {
'--format vcf --offline --refseq --check_existing --everything --no_escape --flag_pick_allele_gene --terms SO --clin_sig_allele 1 --var_synonyms --vcf --assembly GRCh38 --plugin REVEL,file=/path/to/new_tabbed_revel_grch38.tsv.gz --plugin SpliceAI,snv=/path/to/spliceai_scores.raw.snv.hg38.vcf.gz,indel=/path/to/spliceai_scores.raw.indel.hg38.vcf.gz --plugin CADD,snv=/path/to/cadd_whole_genome_SNVs.tsv.gz,indels=/path/to/cadd_gnomad.genomes.r4.0.indel.tsv.gz'
}
}
}