Upcoming changes - OHDSI/Vocabulary-v5.0 GitHub Wiki
Pre-/Post-coordination migration
According to the discussion around the post-coordination of pre-coordinated SNOMED measurements, the Vocabulary Team does not perform any further post-coordination work.
Pack_content table, unavailable in Athena, will be available via the Vocabulary-v5.0 github repository.
We will deliver the SSSOM-compatible metadata for certain relationships. The metadata will mimic the one we collect during the community contribution process and will contain the following information:
-
Relationship metadata
- Relationship_id predicate (how precise the relationships are): equivalent [exactMatch], uphill [broadMatch], downhill [narrowMatch]
- Mapping tool (name of the tool or approach used to derive mappings if any)
- Mapping source (source of the mapping if relationships are coming from the curated sources such as UMLS)
- Confidence (how confident human reviewer is about the quality of the mapping)
- Mapper_id (identifier of the person who performed the mappings if any)
- Reviewer_id (identifier of the person who reviewed the mappings if any)
-
Concept metadata
NB! The structure of basic vocabulary tables (i.e. concept,concept_relationship) will not be changed and the first release of metadata will be stored in separate tables.
- In partnership with the Critical Path Institute, we introduce the CDISC (Clinical Data Interchange Standards Consortium) - a cornerstone terminology used in SDTM (data model for clinical trial data). We used 2023_11 version.
- Concepts’ level overview:
- We used NCI-metathesaurus as the source. We included the full list of CDISC values available.
- In some cases concept codes are represented as a complex construct from a unique source identifier and CDISC code values (where available) separated by ‘-’.
- The vocabulary is enriched with synonyms to facilitate the ETL process.
- All the CDISC concepts are non-standard.
- Relationships‘ level overview:
- Mappings are limited to those that can be either algorithmically derived from NCI-metathesaurus tables for clear 1:1 mappings, or curated manually for pre-selected, prioritized ones.
Vocabulary | Vocabulary Module | Version |
ATC | NA | January 2024 version |
LOINC | NA | 2.77 Feb 2024 version |
SNOMED | SNOMED CT International | 2024-02-01 |
SNOMED | SNOMED CT US Edition | 2024-03-01 |
SNOMED | SNOMED CT UK Edition | 2024-04-10 |
MedDRA | NA | version 27.0, effective from March 2024 |
HCPCS | NA | July 2024 |
CPT4 | NA | 2024AA |
CMS Place of Service vocabulary | NA | May 2, 2024 |
ICD10CM | NA | FY2025, effective from October 01, 2024 |
EDI | NA | July 2024 version |
Visit | NA | August 2024 |
OMOP Extension | NA | August 2024 |
Click here to expand
-
HCPCS:
- We improved the precision of validity dates of reused codes to ensure more accurate cohort building.
-
CMS Place of Service:
- Concept ‘27, Inpatient Long-term Care (Deprecated)’ has been deprecated by the source and its concept code was reused.
NB! According to our current policy, we do not update the names of the reused codes.
-
ATC:
- We corrected the names of ATC classes based on the refined assignment of routes of administration.
-
CMS Place of Service:
- We added a note ‘(Deprecated)’ to the names of the codes deprecated by the source concepts. These codes are maintained as Standard (unless they are mapped).
-
PPI:
- Concept name has been changed for some concepts as a part of the Community Contribution Initiative.
-
SNOMED:
- Concepts related to ‘Allergic reactions’ have been moved to the Condition domain.
- Concepts related to ‘Patient encounters’ have been moved to the Observation domain.
-
PPI:
- Concept class has been changed for some concepts as a part of the Community Contribution Initiative.
Click here to expand
Vertical (hierarchical) relationships:
- LOINC - SNOMED hierarchy has been expanded.
- MedDRA - SNOMED / OMOP Extension hierarchy has been expanded.
-
CPT4 - SNOMED hierarchy has been expanded. We embedded CPT4 anesthesia-related codes into the SNOMED hierarchy.
CPT4 - SNOMED cat’ relationships will continue to contribute to the hierarchy. After extensive testing, we concluded that their exclusion from the hierarchy leaves a large number of standard procedure codes outside the concepts_ancestor. - ATC-RxNorm Relationships hierarchy has been expanded and refined. We introduced a new source-based approach to derive relationships from existing drug vocabularies and greatly refined the relationships to improve accuracy and comprehensiveness. We constructed new relationships for the ATC codes introduced since the previous release. We cleaned up old relationships from ATC to RxNorm drugs.
Horizontal (non-hierarchical/ mappings) relationships:
-
SNOMED:
- SNOMED concepts in the Measurement domain mapped to LOINC during the previous release have become standard and their mappings to LOINC have been reverted. LOINC - SNOMED deduplication to be discussed in future Vocabulary WG sessions.
- Some of the SNOMED concepts were mapped over to their standard counterparts.
It includes both cases when SNOMED concepts were standard and cases when SNOMED concepts did not have mappings. Currently, these affect the following chapters:
- Vaccine and immunization codes in the procedure domain
- Units, providers and drugs
- Personal history and allergies
- Geography
Concepts ‘02, Telehealth Provided Other than in Patient’s Home’ and ‘10, Telehealth Provided in Patient’s Home’ have been destandardized and mapped to the Standard Telehealth concept in the Visit vocabulary, as well as other concepts, previously mapped to ‘02, Telehealth Provided Other than in Patient’s Home’ and ‘10, Telehealth Provided in Patient’s Home’
For more advanced users, we have created a more detailed, vocabulary-focused version of the upcoming changes. Please find it below:
Click here to expand
- Pack_content table, unavailable from the Athena, will be available via the Vocabulary-v5.0 GitHub repository.
- We will deliver the SSSOM-compatible metadata for certain relationships. The metadata will mimic the one we collect during the community contribution process and will contain the following information:
-
Relationship metadata
- Relationship_id predicate (how precise the relationships are): equivalent [exactMatch], uphill [broadMatch], downhill [narrowMatch]
- Mapping tool (name of the tool or approach used to derive mappings if any)
- Mapping source (source of the mapping if relationships are coming from curated sources such as UMLS)
- Confidence (how confident a human reviewer is about the quality of the mapping)
- Mapper_id (identifier of the person who performed the mappings if any)
- Reviewer_id (identifier of the person who reviewed the mappings if any)
-
Relationship metadata
-
Concept metadata
NB! The structure of basic vocabulary tables (i.e. concept, concept_relationship) will not be changed, and the first release of metadata will be stored in separate tables.
- ATC has been refreshed to the most recent January 2024 version.
- Concepts’ level overview:
- We corrected the names of ATC classes based on the refined assignment of routes of administration.
- Relationships’ level overview:
- We introduced a new source-based approach to derive relationships from existing drug vocabularies and greatly refined the relationships to improve accuracy and comprehensiveness.
- We constructed new relationships for the ATC codes introduced since the previous release.
- We cleaned up old relationships from ATC to RxNorm drugs and ingredients.
- LOINC vocabulary has been updated with its 2.77 Feb 2024 version.
- Relationships’ level overview:
- The LOINC-SNOMED hierarchy has been expanded.
- SNOMED concepts in the Measurement domain mapped to LOINC during the previous release have become standard and their mappings to LOINC have been reverted. LOINC - SNOMED deduplication to be discussed in future Vocabulary WG sessions.
- SNOMED has been updated according to the following versions of its modules:
- 2024-02-01 SNOMED CT International;
- 2024-03-01 SNOMED CT US Edition;
- 2024-04-10 SNOMED CT UK Edition.
- Concepts’ level overview:
- Allergic reactions have been moved to the Condition domain.
- Patient encounters have been moved to the Observation domain.
- Relationships’ level overview:
- SNOMED concepts in the Measurement domain mapped to LOINC during the previous release have become standard and their mappings to LOINC have been reverted. LOINC - SNOMED deduplication to be discussed in future Vocabulary WG sessions.
- Some of the SNOMED concepts were mapped over to their standard counterparts.
- Vaccine and immunization codes in the procedure domain
- Units, providers and drugs
- Personal history and allergies
- Geography
- According to the discussion around the post-coordination of pre-coordinated SNOMED measurements, we do not perform any further post-coordination.
It includes both cases when SNOMED concepts were standard and cases when SNOMED concepts did not have mappings. Currently, these affect the following chapters:
- MedDRA has been updated according to the source version 27.0, effective from March 2024.
- Relationships’ level overview:
- The MedDRA-SNOMED / OMOP Extension hierarchy has been expanded.
- Hierarchy mirroring (bugfix):
- Pre-coordinated MedDRA terms (Measurements) are currently positioned below general SNOMED in affected cases.
- HCPCS has been updated according to the July 2024 version.
- Concepts’ level overview:
- We improved the precision of validity dates of reused codes to ensure more accurate cohort building.
- Relationships’ level overview:
- We improved drug mapping (incl. community contribution).
- CPT4 has been updated according to the 2024AA version of UMLS Metathesaurus.
- Relationships’ level overview:
- Newly added immunization codes 90624 and 90695 have been mapped to Standard concepts in the Drug domain.
- We expanded the CPT4 - SNOMED hierarchy by embedding CPT4 anesthesia-related codes into the SNOMED hierarchy.
- ‘CPT4 - SNOMED cat’ relationships will continue to contribute to the hierarchy. After extensive testing, we concluded that their exclusion from the hierarchy leaves a large number of standard procedure codes outside the concepts_ancestor.
- CMS Place of Service vocabulary has been updated according to the May 2, 2024 version.
- Concepts’ level overview:
- We added a note ‘(Deprecated)’ to the names of the codes deprecated by the source concepts. These codes are maintained as Standard (unless they are mapped).
- Concept ‘27, Inpatient Long-term Care (Deprecated)’ has been deprecated by the source and its concept code was reused. According to our current policy, we do not update the names of the reused codes.
- Relationships’ level overview:
- Concepts ‘02, Telehealth Provided Other than in Patient’s Home’ and ‘10, Telehealth Provided in Patient’s Home’ have been destandardized and mapped to the Standard Telehealth concept in the Visit vocabulary, as well as other concepts, previously mapped to ‘02, Telehealth Provided Other than in Patient’s Home’ and ‘10, Telehealth Provided in Patient’s Home’
- Concepts’ level overview:
- A new OMOP-created concept has been added to the Visit vocabulary: OMOP5556618, Telehealth. This concept is intended to be a standard concept for all Telehealth visits, with CMS Place of Service codes 02 and 10 mapped to it.
- Relationships’ level overview:
- Missing ‘Maps to value’ relationships have been added (bugfix)
- In partnership with the Critical Path Institute, we introduce the CDISC (Clinical Data Interchange Standards Consortium) - a cornerstone terminology used in SDTM (data model for clinical trial data). We used the 2023_11 version.
- Concepts’ level overview:
- We used NCI-metathesaurus as the source. We included the full list of CDISC values available.
- In some cases concept codes are represented as a complex construct from a unique source identifier and CDISC code values (where available) separated by ‘-’.
- The vocabulary is enriched with synonyms to facilitate the ETL process.
- All the CDISC concepts are non-standard.
- Relationships‘ level overview:
- Mappings are limited to those that can be either algorithmically derived from NCI-metathesaurus tables for clear 1:1 mappings, or curated manually for pre-selected, prioritized ones.
- Overview:
- In this release, we incorporate external mapping sources, such as SNOMED - ICD mappings and UMLS into the ICD Family Common Data Environment.
- ICD10, ICD10CM, ICD9CM, ICD10GM, ICD10CN, CIM10 and KCD7 vocabularies have been updated to take into account new mapping candidates.
- With additional mapping sources integration, we continue mapping harmonization across vocabularies by refactoring groups of concepts and adjusting grouping criteria. Some previously integrated groups have been split to enhance the granularity of mappings.
-
ICD10CM:
- Has been updated to the latest version (FY2025), effective from October 01, 2024.
- Relationships’ level overview:
- Existing mappings have been updated following:
- SNOMED update
- Community contribution integration
- Existing mappings have been updated following:
-
ICD10:
- Relationships’ level overview:
- Existing mappings have been updated following the SNOMED update.
- Relationships’ level overview:
-
ICD9CM:
- Relationships’ level overview:
- Existing mappings have been updated following the:
- SNOMED update
- Community contribution integration
- Existing mappings have been updated following the:
- Relationships’ level overview:
-
ICD10GM:
- Relationships’ level overview:
- Existing mappings have been updated following the SNOMED update.
- Relationships’ level overview:
-
ICD10CN:
- Relationships’ level overview:
- Existing mappings have been updated following the SNOMED update.
- Relationships’ level overview:
-
KCD7:
- Relationships’ level overview:
- Existing mappings have been updated following the SNOMED update.
- Relationships’ level overview:
-
CIM10:
- Relationships’ level overview:
- Existing mappings have been updated following the SNOMED update.
- Relationships’ level overview:
-
PPI:
- Concepts’ level overview:
- Concept class and concept name have been changed for some concepts
- Concepts’ level overview:
-
HCPCS
- Relationships’ level overview:
- The mapping of one drug HCPCS code has been improved using the community contribution input.
- Relationships’ level overview:
-
ICD10PCS
- Relationships’ level overview:
- Mappings of some ICD10PCS codes to Standard concepts in the Drug domain have been added.
- Relationships’ level overview:
-
ICD10CM
- Relationships’ level overview:
- Mappings have been improved with the help of the community contribution input.
- Relationships’ level overview:
-
ICD9CM
- Relationships’ level overview:
- Mappings have been improved with the help of the community contribution input.
- Relationships’ level overview:
- EDI, the Korean Electronic Data Interchange code system was updated as a part of a community contribution initiative/pipeline.
- The vocabulary has been updated to the July 2024 version.
- Source data update was done by Seng Chan You and Yiju Park.
- The resulting content was integrated by the vocabulary team using the stage table transfer approach.
SNOMED UK Drug Extension Module has been excluded from SNOMED OMOP vocabularies. Concepts that belonged to this module have been deprecated and linked to their duplicates in dm+d vocabulary. Their concept codes and names have been replaced by placeholders to avoid misuse. UK Drug Extension concepts in the Route domain have been de-standardized and mapped to standard Routes that belong to other SNOMED Modules.
The concept class Clinical Finding has been split into two concept classes: Disorder and Clinical Finding, as assigned in the source by SNOMED. This decision will help us to assign domains more easily and more precisely.
Concepts that belong to Attribute, Location (except countries), Social Context (except concepts that carry the semantics of relatives, religion, occupation), Physical Force, and Physical Object (except concepts in the Device domain) concept classes have been de-standardized.
Domain assignment has been improved to achieve better accuracy. For example, concepts ‘Positive’, ‘Negative’, and ‘Malignant’ were moved from Spec Disease Status to the Meas Value domain, and countries were moved to the Geography domain.
Pre-coordinated concepts that represent the results of evaluation procedures have been split and mapped to Measurements and their value. Concepts that cannot be properly mapped remain standard concepts in the Measurement domain. Pre-coordinated concepts that represent allergies have been split and mapped to ‘Allergy to substance’ + causative agent in a post-coordinated way.
Concepts that belong to Observable Entity and Procedure concept classes in most cases represented semantic duplicates in the Measurement domain. According to SNOMED official documentation, the deduplication of Measurements should be performed towards Observable Entities as these concepts are supposed to represent the result of the evaluation procedure. However, we noticed that currently, most measurements in the Observable Entity concept class belong to the UK Clinical Extension Module and their hierarchy is much less complete than the hierarchy of Procedures. In such cases, we have built ‘Maps to’ relationships in the opposite direction - from Observable Entities to Procedures. If Observable Entity concepts belong to the SNOMED CT Core Module they remain standard with the duplicative Procedure concepts mapped to them. Deduplication in the Measurement domain included mapping concepts in the Observable Entity concept class to their duplicates in the Staging/Scales concept class.
Secondary neoplasm concepts have been mapped to the Cancer Modifier.
Concepts in the Race and Provider domains have been mapped to standard concepts in the respective domains.
MedDRA remains a classification vocabulary and can be used in two directions: to map source MedDRA codes to OMOP standard codes and to use MedDRA codes as classification concepts for concept set construction. Our efforts in this release have been focused on mapping Preferred Terms (PT), which means that the mapping of LLT terms stays without major changes. More detailed information can be found at the forum.
Using MedDRA concepts as classification will be available for mapped PT and LLT concepts. In this case, ‘Maps to’ links are used for hierarchy construction. PT and LLT concepts without corresponding ‘Maps to’ links will become non-standard.
Concepts that previously were assigned the Visit and Provider domains according to their mapping have been moved to the Observation domain according to the Community’s request..
All HCPCS concepts in the Drug domain have become non-standard as HCPCS is not a standard vocabulary for Drugs. All HCPCS concepts in the Drug domain are supposed to be mapped to standard concepts in the Drug domain. If concepts cannot be mapped due to lack of specificity, they are assigned with Procedure domain, embedded into the hierarchy of Procedures, and remain standard.
LOINC vocabulary has been updated with its 2.76 Sep 2023 version.
The LOINC-SNOMED hierarchy was significantly expanded. Specifically, we've leveraged the outcomes of LOINC-SNOMED collaboration, particularly the LOINC ontology, to enhance and refine the hierarchical relationships between SNOMED and LOINC Measurements.
In this release, we are refreshing VANDF and VA Class vocabularies - Veterans Health Administration National Drug File, one of RxNorm source vocabularies.
We are completely changing the logic of the domain assignment. Previously, all of the concepts within VANDF vocabulary were considered drugs, even though they did not qualify as drugs in OHDSI. Currently, we are using hierarchical relationships with VA Class vocabulary to assign domains correctly.
LOINC vocabulary will be updated with its 2.74 Feb 2023 version.
More COVID-2019-related lab tests will be included in the hierarchy of Measurement of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
We detected and will fix the problem of LOINC - SNOMED hierarchy during the upcoming release. Some LOINC concepts had multiple ancestors in SNOMED, which are ancestors and descendants of each other:
concept_code | concept_name | relationship_id | concept_code | concept_name | |
Before | 12849-6 | Protein [Mass/volume] in Peritoneal dialysis fluid --7th specimen | Is a | 74040009 | Protein measurement |
12849-6 | Protein [Mass/volume] in Peritoneal dialysis fluid --7th specimen | Is a | 166776003 | Serum/plasma protein test (descendant of Protein measurement) | |
12849-6 | Protein [Mass/volume] in Peritoneal dialysis fluid --7th specimen | Is a | 166809004 | Electrophoresis: paraprotein (descendant of Protein measurement) | |
After | 12849-6 | Protein [Mass/volume] in Peritoneal dialysis fluid --7th specimen | Is a | 74040009 | Protein measurement |
In the course of the v20230116_major release, a bunch of CPT4 concepts that carry the semantics of visits were de-standardized and mapped to standard concepts in the Visit domain and changed their domain according to the mapping. These domain changes caused a lot of difficulties in the ETL processes because CPT4 codes are not routinely used for creating visit records.
The Vocabulary team considered the Community’s opinion (as per discussions on the OHDSI Forums and CDM WG), and are rolling back the domain changes in the upcoming August release. Also, we are preserving the mapping of these codes to the standard concepts in the Visit domain as they still carry the semantics of visits.
Equivalence relationships between CPT4 and SNOMED ( 'CPT4 - SNOMED cat' and 'CPT4 - SNOMED eq') are created using the inter-vocabulary synonymous and categorical relationships present in the UMLS source, respectively. In the scope of this release, 2163 such links are added. Upon adding a new target concept to the UMLS source we are deprecating the previous link of equivalence for the respective CPT4 concept.
This release enriches the NAACCR vocabulary with mappings built and provided by the community contributors. Around 1.5k of codes from 'NAACCR Value' concept class are de-Standardized and mapped over to the Cancer Modifier vocabulary. Related NAACCR Variables with their standard concept and validity parameters remained intact.
Make sure that your ETL is ready to implement the changes:
- Retrieve the mappings from non-standard NAACCR Values to Cancer Modifier Measurement concepts (relationship_id = ‘Maps to’).
- Populate the measurement_concept_id with concept_id from the standard Cancer Modifier counterparts.
- Do not populate the value_as_concept_id field (even though it may be stored as the value in the source data).
Concept_id for event fields should be selected depending on the mapping of NAACCR Value solely.
Different ETL scenarios are described in following table:
Source data (NAACCR Variable-Value) | OMOP CDM target | |||||||
Logical group | NAACCR Variable | NAACCR Value | concept code | concept name | ||||
concept code | concept name | std concept | concept code | concept name | std concept | |||
1 | 2860 | CS Mets Eval | S | For cases when value is not mapped or not populated | 2860 | CS Mets Eval | ||
2860 | CS Mets Eval | S | larynx_supraglottic@2860@3 | Meets criteria for AJCC pathologic staging of distant metastasis:||Specimen from metastatic site microscopically positive WITHOUT pre-surgical systemic treatment or radiation |OR specimen from metastatic site microscopically positive, unknown if pre-surg | NULL | OMOP4998856 | Metastasis | |
2.1 | 774 | EOD Regional Nodes | S | For cases when value is not mapped or not populated | 774 | EOD Regional Nodes | ||
774 | EOD Regional Nodes | S | melanoma_choroid_ciliary_body@774@999 | Unknown; regional lymph node(s) not stated|Regional lymph node(s) cannot be assessed|Not documented in patient record||Death Certificate Only | NULL | NA | NA | |
2.2 | 774 | EOD Regional Nodes | S | For cases when value is not mapped or not populated | NA | NA | ||
774 | EOD Regional Nodes | S | net_jejunum_ileum@774@400 | Large mesenteric masses (greater than 2 cm)|Lymph node metastasis greater than 2 cm | NULL | OMOP4998946 | Regional spread to lymph node | |
2.3 | 774 | EOD Regional Nodes | S | For cases when value is not mapped or not populated | 774 | EOD Regional Nodes | ||
liver@774@800 | Regional lymph node(s), NOS|Lymph node(s), NOS | S | liver@774@800 | Regional lymph node(s), NOS|Lymph node(s), NOS | ||||
3 | merkel_cell_penis@2870 | Size of Metastasis in Lymph Nodes | NULL | Irrespective of value | OMOP4998351 | Dimension of Lymph Node | ||
merkel_cell_penis@2870@990 | Metastasis or tumor nests in regional lymph nodes, size cannot be assessed | NULL | OMOP4998946 | Regional spread to lymph node |
*std - standard_concept
Our team continues remapping of smoking-related concepts to the OMOP Extension vocabulary. During our last major release, we created a set of OMOP Extension concepts to accompany the ETL Smoking convention. The progress can be tracked here on the OHDSI Forum.
During this release we will focus on remapping of SNOMED concepts: the smoking-related SNOMED terms will be mapped to new OMOP Extension concepts. These changes affect smoking-related concept sets and cohorts built on currently standard SNOMED concepts. The top concept of hierarchy is Findings of tobacco or its derivatives use or exposure. Tobacco users are now defined according to the type of the product they use (Smokeless, Electronic, Cigarettes, Cigars, etc.), while cigarette smokers are also classified according to the severity of smoking (Trivial, Light, Moderate, Heavy, Very heavy). Cigarette pack-years smoked during life concept is intended to capture the cumulative consumption of cigarettes.
Note 1: these changes do not concern the concepts that are adjacent to smoking (for example, Nicotine dependence). Identification of patients who smoke can be based on more broad terms (such as nicotine dependence or nicotine abuse), more granular terms (such as fact of smoking, number of pack-years, etc.) or combination of such. The hierarchy we created enables clean representation and retrieval of the granular concepts. Your concept set design should depend on code utilization in the data intended to be used and intended specificity/sensitivity of your phenotype.
Note 2: in some cases post-coordination in data modeling may be changed to pre-coordination and vice versa. These may significantly affect ETL process through the source_to_concept_map table.
Source data (various vocabularies*) | OMOP CDM target (OMOP Extension) | ||||||
source_concept_id | source_concept_name | source_value_concept_id | source_value_concept_name | event_concept_id | concept_name | value_as_concept_id | concept_name |
4310250 | Ex-smoker | NA | 1340204 | History of event | 903657 | Cigarette smoker | |
903651 | Currently doesn't use tobacco or its derivatives | NA | |||||
4203874 | Smoking monitoring status | For cases when value is not mapped or not populated | NA | NA | NA | NA | |
4298794 | Smoker | 903657 | Cigarette smoker |
* - this might be relevant to custom mapping vocabularies also
Please make sure that upcoming changes will be addressed in your ETLs and studies.