Vocab. OMOP_Genomic - OHDSI/Vocabulary-v5.0 GitHub Wiki
There is no comprehensive genomic terminology available in the public domain that would support the harmonization of genomic oncology data, which is necessary for standardized analytics in a research network.
To address these challenges and to allow the interoperability between genomic data and clinical care, a collaboration between the OHDSI Oncology Workgroup and the VICC consortium developed a canonical representation of equivalent variants to serve as standard concepts in the OMOP Standardized Vocabularies.
The following ontologies were incorporated:
- HGNC (HUGO Gene Nomenclature Committee, https://www.genenames.org/)
- ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/)
- CiVIC (Clinical Interpretation of Variants in Cancer, https://civicdb.org/home)
- NCIt (National Cancer Institute Thesaurus, https://ncithesaurus.nci.nih.gov/ncitbrowser/)
- CGI (Cancer Genome Interpreter, https://www.cancergenomeinterpreter.org/home)
- CAP (College of American Pathologists Cancer Checklists, https://www.cap.org/)
- JAX CKB (The Clinical Knowledgebase by The Jackson Laboratory, https://www.jax.org/clinical-genomics/ckb#)
The procedures for transforming Concepts from the source to the OMOP Standard Vocabularies can be found on the OHDSI GitHub. The current version of the OMOP Genomic vocabulary is powered by KOIOS.
KOIOS is an open source tool developed and supported by the OHDSI Oncology WG that allows users to combine their variant data with the OMOP Genomic Vocabulary in order to generate a set of genomic standard concept IDs from raw patient-level genomic data.
Concept Class | Mechanism | Example |
Structural Variant | 1. (type of variation(e.g. del/ins/t))(chromosomal coordinate 1)(chromosomal coordinate 2) measurement
2. Trisomy/Monosomy/Tetrasomy (chromosome) measurement 3. Microsatellite Instability (MSS) (Effected Locus) measurement |
del(1)(p11) measurement |
Gene RNA Variant | (Type of Variation) of (Short Variant detail, e.g. A > G, where applicable)
OR (Type of Variation) of (Short Variant detail, e.g. A > G, where applicable) |
A1CF transcript: Substitution in position 100 of G replaced by A measurement |
Gene Protein Variant | (Type of Variation) of (Protein detail, e.g. A > E, where applicable)
OR (Type of Variation) of (Protein detail, e.g. A > E, where applicable) |
A1CF protein: Substitution in position 34 of E replaced by A measurement |
Genetic Variation | (Gene Symbol) (Long gene name) gene variant measurement | A1BG (alpha-1-B glycoprotein) gene variant measurement |
Gene Variant | (Gene Symbol 1):(Gene Symbol 2) gene fusion measurement | A2M::ALK gene fusion measurement |
Gene DNA Variant | (Type of Variation) in (Coordinate 1) of (Short Variant detail, e.g. A > G, where applicable)
OR (Type of Variation) in (Coordinate 1) to (Coordinate 2, where applicable) of (Short Variant detail, e.g. A > G, where applicable) |
A1CF on GRCh38 chr10: Substitution in position 50844121 of T replaced by G measurement |
Concept Codes were generated as OMOP# where # is a number that is not equal to the # of another concept.
All Valid OMOP Genomic concepts are Standard.
All concepts belong to the Measurement domain.
Genetic Variation |
Gene Variant |
Structural Variant |
Gene DNA Variant |
Gene RNA Variant |
Gene Protein Variant |
Based on knowledge about the processes of transcription and translation, the following hierarchy of concepts was introduced in OMOP Genomic:
Some OMOP Genomic concepts are non-Standard. That means they have to be mapped to the corresponding Standard Concepts using the CONCEPT_RELATIONSHIP table ("Maps to" records).
Part of Standard Concepts (with Concept Class Genetic Variation) have Valid relationships Mapped from and Subsumes with CPT4 and HCPCS Vocabularies. A few Standard Concepts (with Concept Class Structural Variant) have Valid relationships Variant of with SNOMED Vocabulary.
Most OMOP Genomic Concepts create multiple records in the CONCEPT_RELATIONSHIP table, but some of them are mapped to single Concepts, generating one-to-one records.