Release planning - OHDSI/Vocabulary-v5.0 GitHub Wiki

This page provides you with the planned maintenance and improvement activities around the OHDSI Standardized Vocabularies. Below you can find the content of each release and an overview of the planned improvement activities (detailed content to be posted separately).

Contents

Principles

Activities per releases

Improvement activities

Principles

1. Stable cadence

As most of the community members refresh the Vocabularies and the data annually or semi-annually, the cadence of releases is twice a year. Such a schedule has a higher productivity, transparency in the content of the releases, and better version alignment in the community. Two releases (August and February) correlate with the source release schedule. An intermediate release in May 2023 is planned for work already accomplished.

2. Combination of maintenance and process improvement

Vocabulary work balances:

  • Routine maintenance,
  • Automation, usually across concepts and vocabularies of one domain at a time (overhauls, machinery improvements, etc.),
  • Process improvement (e.g., community contribution guidelines or version control).

3. Prioritization based on the needs of the community

The roadmap is based on a continuous need assessment of the community, both in terms of vocabulary maintenance as well as process improvement.

4. Transparency

The roadmap is made publicly available.

Activities per releases

Roadmap 2024 Q4 - 2025 Q3:

Roadmap 2023 - 24

The plan for 2024 Q4 - 2025 Q3 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS, MedDRA and ATC.

The following table outlines the vocabularies included in each release as per the roadmap above.

Note: mapping curation assumes a review of the mappings to ensure quality and precision. Mapping propagation assumes automatically following mapping links in the instance of target term update by the source without curating the links.

Activity Vocabulary version and modification
Winter release, February 2025
CPT4 visit mappings rollback + refresh (UMLS 2024AB version)
HCPCS visit mappings rollback + refresh (January 2025 version)
SNOMED Int Refresh (August 2024 version)
SNOMED US Refresh (September 2024 version)
SNOMED UK Refresh (October 2024 version)
MedDRA mapping propagation
ICD9CM mapping propagation
ICD10CM (US) mapping curation
ICD10 (int) mapping curation
ICD10CN (China) mapping propagation
ICD10GM (Germany) mapping propagation
CIM10 (France) mapping propagation
KCD7 (Korea) mapping curation
Mesh mapping propagation
CIEL mapping propagation
OPCS4 mapping propagation
CO-CONNECT mapping propagation
UK Biobank mapping propagation
ICD9Proc mapping propagation
VANDF mapping propagation
OMOP Invest Drug mapping propagation
Read mapping curation
RxNorm Refresh (November 2024 version)
RxNorm Extension Refresh (December 2024 version)
CVX Refresh (January 2025 version) + mapping curation
NDC Refresh (January 2025 version) + mapping curation
SPL Refresh (January 2025 version)
ATC Refresh (2025 version) + continuation of improvement work
LOINC Refresh (August 2024 version)
ICD10PCS Refresh (2025 version) + mapping propagation
Post-coordination rollback -
Community contribution Route Domain refinement*
Summer release, August 2025
CPT4 Refresh (UMLS 2025AA version)
HCPCS Refresh (July 2025 version)
SNOMED Int Refresh (March 2025 version)
SNOMED US Refresh (March 2025 version)
SNOMED UK Refresh (April 2025 version)
MedDRA Refresh (March 2025 version)
ICD9CM mapping curation
ICD10CM (US) Refresh (FY 2026 version) + mapping curation
ICD10 (int) mapping curation
ICD10CN (China) mapping curation
ICD10GM (Germany) Refresh (2025 version) + mapping curation
CIM10 (France) mapping propagation
KCD7 (Korea) mapping curation
Mesh mapping propagation
CIEL mapping propagation
OPCS4 mapping propagation
CO-CONNECT mapping propagation
UK Biobank mapping propagation
ICD9Proc mapping propagation
VANDF mapping propagation
OMOP Invest Drug mapping propagation
Read mapping curation
RxNorm Refresh (June 2025 version)
RxNorm Extension Refresh (July 2025 version)
CVX Refresh (July 2025 version) + mapping curation
NDC Refresh (July 2025 version) + mapping propagation
SPL Refresh (July 2025 version)
ATC continuation of improvement work
LOINC Refresh (February 2024 version)
ICD10PCS mapping curation

* Route Domain refinement (simplification and relationships) has been selected as a complex contribution for February 2025 release. It had been previously ratified with the community. More information here.

Complex contribution for August 2025 will be prioritized and selected in February 2025. The full list of the current complex contributions (planned or in progress) include:

What Content
Route simplification and relationships Create new simple model for routes, link specific SNOMED routes to the new routes (Theresa Bukard)
SNOMED Veterinary Refresh and de-coupling with SNOMED (Julie Green)
CIEL Refresh + modifications (Andrew Kanter)
HemOnc Oncology vocabulary with mixed domains (Jeremy Warner)
ICDO3 Oncology vocabulary refresh (Peter Prinsen)
Vaccine ontology Build a new system to classify vaccines in the OHDSI Standardized Vocabularies (Oliver He)
ORPHANET New non-standard vocabulary for rare diseases, which can be incorporated through automated procedure using UMLS (Michele Zoch)
Spanish Drug Agency vocabulary Incorporation of Spanish drugs into OHDSI Standardized Vocabularies (Juan Ignacio)
DPD Canadian drug vocabulary refresh (Mahmoud Azimaee)
JMDC Refresh of Japan drug vocabulary refresh (Dmitry Dymshyts)

In addition to selecting one primary contribution for the release, we will also continuously support stewards with developing complex contributions throughout the year subject to resource availability.

Roadmap 2023 Q1 - 2024 Q2:

Roadmap 2023 - 24

The plan for 2023 Q1 - 2024 Q2 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS,

MedDRA, MeSH, NAACCR, dm+d, as well as improvement activities tailored to the most commonly reported problems described above.

Table 1 outlines the vocabularies included in each release as per the roadmap above.

Table 1. Vocabularies and activities included in each release.

Activity Vocabulary version and modification Name
Spring release, May 2023
CVX refresh (20230222 version) and refactored code Maria, Timur
dm+d refresh (20220927 version) and refactored code Oleg, Timur
HCPCS improvement + refresh (Apr 2023 version) Masha, Timur
MeSH refresh (2022 version) and refactored code Timur
NAACCR mapping addition Vlad, Timur
NDC refresh (20230319 version) Oleg
RxNorm refresh (20230306 version) Oleg, Timur
RxNorm Extension refresh (May 2023 version) Oleg
Smoking hierarchy mapping addition Maria, Timur
SPL refresh (20230319 version) Oleg
Summer release, August 2023
CPT4 refresh (Spring 2023 version) Masha, Timur
LOINC refresh (2.74 version) Maria, Timur
NDC refresh (Aug 2023 version) Oleg
RxNorm refresh (Aug 2023 version) Oleg, Timur
RxNorm Extension refresh (Aug 2023 version) Oleg
SPL refresh (Aug 2023 version) Oleg
VANDF refresh (20230306 version) Oleg, Varvara, Timur
Community contribution guidelines (part 1) coverage of basic use cases Anna, Alex, Christian, Timur
Vocabulary Quality System (part 1) conformance checks publicly available with each release Alex, Anna, Christian, Timur
Winter release, February 2024
CVX refresh (Summer-Fall 2023 version) Maria, Timur
LOINC refresh (Summer-Fall 2023 version) Maria, Timur
Read mapping refresh Maria, Irina
HCPCS refresh (Oct 2023 version) Masha, Timur
ICD10PCS refresh (2023 version) Masha, Maria, Timur
MedDRA improvement + refresh (version 26, Mar 2023) Mikita, Timur
NDC refresh (Jan 2023 version) Oleg
RxNorm refresh (Dec 2023 version) Oleg, Timur
RxNorm Extension refresh (Feb 2023 version) Oleg, Timur
SPL refresh (Jan 2023 version) Oleg
SNOMED overhaul overhaul Oleg, Timur
SNOMED UK refresh (Spring-Summer 2023 version) Oleg, Timur
SNOMED Int refresh (Spring 2023 version)
SNOMED US refresh (Feb 2023 version)
ICD machinery improvement Irina, Oleg, Timur
ICD9(CM) mapping improvement Irina, Oleg
ICD10(CM) refresh (2022/2023 versions)
ICD10 (int) mapping improvement
ICD10CN (China) mapping improvement
ICD10GM (Germ) refresh (2023 version)
CIM10 (France) refresh (2023 version)
Community contribution guidelines (part 2) coverage of complex use cases Anna, Alex, Christian, Timur
Vocabulary Quality System (part 2) standardized system with more complex assessment Alex, Anna, Christian, Timur
Summer release, August 2024
ATC overhaul + refresh (2024 version) Anna, others tbd
CPT4 refresh (2024 version) Masha, Timur
CVX refresh (2024 version) Maria, Timur
HCPCS refresh (April 2024 version) Masha, Timur
ICD9(CM) mapping improvement Maria, Irina
ICD10(CM) refresh (2023/2024 versions)
ICD10 (int) mapping improvement
ICD10CN (China) mapping improvement
ICD10GM (Germ) refresh (2023/2024 versions)
CIM10 (France) refresh (2023/2024 versions)
LOINC refresh (2024 version) Maria, Timur
MedDRA refresh (2024 version) Mikita, Timur
NDC refresh (Aug 2024 version) Oleg
OMOP Invest Drug refresh (2024 version) Oleg, Varvara, Timur
Read mapping refresh Maria
RxNorm refresh (Feb 2024 version) Oleg, Timur
RxNorm Extension refresh (Aug 2024 version) Oleg
SNOMED Int refresh (Feb 2024 version) Masha, Timur
SNOMED UK refresh (Apr 2024 version)
SNOMED US refresh (Mar 2024 version)
SPL refresh (Aug 2024 version) Oleg
VANDF refresh (2024 version) Varvara, Timur

Improvement activities

Vocabulary-specific overhauls and improvements include:

1. SNOMED overhaul

  • Stable domain and concept class id assignment.
  • Alignment of the validity dates with the source.
  • Fix of the problem with replacement relationships (such as “Concept replaced by”) not having “Maps to” links that prevent the users from automatically following the “Maps to” relationships from non-standard to standard counterparts.
  • Clean-up of existing legacy “Maps to” relationships originating from “Concept is a possible equivalent to”.
  • De-standardize and map the concepts in Drug and other (Race, Provider) domains to the standard concepts so that they can be effectively used in the sources that use SNOMED-CT (such as CPRD).
  • Split up the pre-coordinated concepts (such as lab test with the results, allergies to the specific substances) and map them over to the respective concepts.
  • Documentation of SNOMED-CT processing, domain assignment and quality assurance.

2. ICD family improvement

  • Mapping re-use across ICD family to identify the discrepancies and similarities across different versions of ICD and improve the consistency of mappings.
  • Incorporation of the mappings provided by SNOMED-CT and other sources.
  • Fix of the source (CIAML) file processing to capture the ICD concepts currently missing.
  • Documentation of the current procedures for mapping and quality assurance.

3. MedDRA improvement

  • Design and document the model that would allow to use MedDRA as both source and Classification terminology in the Condition Domain.
  • Development of system that would allow to re-use the mappings of various sources (MedDRA-SNOMED initiative, UMLS), build our own based on the user needs, annotate them with metadata using SSSOM or other standards, and automatically transform them using generated metadata in both horizontal and hierarchical relationships.
  • Build of “Maps to” relationships from MedDRA to SNOMED.
  • Build of hierarchical relationships between MedDRA and SNOMED.

4. ATC overhaul

  • Adopt the data-driven approach of attribute selection (RxNorm and RxNorm Extension attributes for ATC codes) based on the data sources that have ATC codes (Z index, JMDC, others).
  • Identification of discrepancies and similarities between code assignment in different data sources to establish more consistent and accurate mappings from ATC to RxNorm (Ext).
  • Validation of the vocabulary using data-driven approaches (including currently existing comparison for 1:1 matching to Clinical Drug Form and further expansion to comparison of the assignments for Clinical Drug, Branded Drug and 1:many matching).
  • If feasible, incorporation of WHO ATC-drug product links and DDD represented in the machine-readable form.
  • Hierarchy review, fix and documentation.

Process improvement activities include:

5. Community contribution guidelines

We divide the guidelines and processes into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.

The first part will handle simple use cases such as changing “Maps to”, changing concept names and domains, adding or deprecating relationships or adding small vocabularies with no internal hierarchy. We will establish the pipeline for incoming requests with clear communication on when they will be incorporated. The pipeline involves submitting a request on GitHub with filled templates that follow stage tables’ structure to facilitate incorporation, instructions on how to fill them and quality assurance checks that need to be performed on the requester side. GitHub requests will facilitate version control and serve for educational purposes for other contributors. We will use existing requests that have not been fulfilled (such as ethnicity codes provided by the Health Equity WG, NIH provider codes and vocabulary, etc.) for dry runs and illustrative purposes.

The second part will target more complex use cases such as adding new vocabularies and changing hierarchies and therefore requires more comprehensive approaches (common development environment, automated scripts for quality assurance, maintenance scripts if applicable) building into a system for community contribution. Potential use cases for dry runs include ICPC2 that consist of adding a vocabulary, new codes and mappings to existing standard concepts.

As we have a standardized system for incorporating drug vocabularies (which, as opposed to other domains, influence standard vocabularies [RxNorm Extension] and therefore require more robust QA), drug vocabularies will be separated into a distinct chapter in the guidelines following the existing guides for contributors.

Community contribution guidelines will also include the guidance and best practices on how to locally add new concepts (in the form of 2 billion codes) and relationships (in the form of source_to_concept_map or concept_relationship) or modify relationships to enable research in those organizations and teams that require such modifications before they are released.

The guidelines and approaches will be shared with the committee and subsequently with the community for feedback.

6. Vocabulary Quality System

We to divide the Vocabulary Quality System into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.

The first part (quality control) includes describing existing procedures and making the documentation publicly available and adding the reports about passing the conformance checks and descriptive statistics (structure of the vocabularies, mapping coverage, gaps in hierarchies, orphan codes and more) to each release. It also includes expanding the tests to ensure comprehensive coverage based on the previously reported problems.

The second part (quality management system) includes designing a quality system with more complex completeness and plausibility checks and external validation. A systematic approach needs to be developed and the existing practices in other ontologies will be taken into consideration. As there is lack of frameworks (analogous to Kahn’s framework for data quality) for complex systems that harmonize and align multiple ontologies, this part will require more research and collaboration among the experts in the OHDSI community.

⚠️ **GitHub.com Fallback** ⚠️