Release planning - OHDSI/Vocabulary-v5.0 GitHub Wiki
This page provides you with the planned maintenance and improvement activities around the OHDSI Standardized Vocabularies. Below you can find the content of each release and an overview of the planned improvement activities (detailed content to be posted separately).
As most of the community members refresh the Vocabularies and the data annually or semi-annually, the cadence of releases is twice a year. Such a schedule has a higher productivity, transparency in the content of the releases, and better version alignment in the community. Two releases (August and February) correlate with the source release schedule. An intermediate release in May 2023 is planned for work already accomplished.
Vocabulary work balances:
- Routine maintenance,
- Automation, usually across concepts and vocabularies of one domain at a time (overhauls, machinery improvements, etc.),
- Process improvement (e.g., community contribution guidelines or version control).
The roadmap is based on a continuous need assessment of the community, both in terms of vocabulary maintenance as well as process improvement.
The roadmap is made publicly available.
The plan for 2024 Q4 - 2025 Q3 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS, MedDRA and ATC.
The following table outlines the vocabularies included in each release as per the roadmap above.
Note: mapping curation assumes a review of the mappings to ensure quality and precision. Mapping propagation assumes automatically following mapping links in the instance of target term update by the source without curating the links.
Activity | Vocabulary version and modification |
Winter release, February 2025 | |
CPT4 | visit mappings rollback + refresh (UMLS 2024AB version) |
HCPCS | visit mappings rollback + refresh (January 2025 version) |
SNOMED Int | Refresh (August 2024 version) |
SNOMED US | Refresh (September 2024 version) |
SNOMED UK | Refresh (October 2024 version) |
MedDRA | mapping propagation |
ICD9CM | mapping propagation |
ICD10CM (US) | mapping curation |
ICD10 (int) | mapping curation |
ICD10CN (China) | mapping propagation |
ICD10GM (Germany) | mapping propagation |
CIM10 (France) | mapping propagation |
KCD7 (Korea) | mapping curation |
Mesh | mapping propagation |
CIEL | mapping propagation |
OPCS4 | mapping propagation |
CO-CONNECT | mapping propagation |
UK Biobank | mapping propagation |
ICD9Proc | mapping propagation |
VANDF | mapping propagation |
OMOP Invest Drug | mapping propagation |
Read | mapping curation |
RxNorm | Refresh (November 2024 version) |
RxNorm Extension | Refresh (December 2024 version) |
CVX | Refresh (January 2025 version) + mapping curation |
NDC | Refresh (January 2025 version) + mapping curation |
SPL | Refresh (January 2025 version) |
ATC | Refresh (2025 version) + continuation of improvement work |
LOINC | Refresh (August 2024 version) |
ICD10PCS | Refresh (2025 version) + mapping propagation |
Post-coordination rollback | - |
Community contribution | Route Domain refinement* |
Summer release, August 2025 | |
CPT4 | Refresh (UMLS 2025AA version) |
HCPCS | Refresh (July 2025 version) |
SNOMED Int | Refresh (March 2025 version) |
SNOMED US | Refresh (March 2025 version) |
SNOMED UK | Refresh (April 2025 version) |
MedDRA | Refresh (March 2025 version) |
ICD9CM | mapping curation |
ICD10CM (US) | Refresh (FY 2026 version) + mapping curation |
ICD10 (int) | mapping curation |
ICD10CN (China) | mapping curation |
ICD10GM (Germany) | Refresh (2025 version) + mapping curation |
CIM10 (France) | mapping propagation |
KCD7 (Korea) | mapping curation |
Mesh | mapping propagation |
CIEL | mapping propagation |
OPCS4 | mapping propagation |
CO-CONNECT | mapping propagation |
UK Biobank | mapping propagation |
ICD9Proc | mapping propagation |
VANDF | mapping propagation |
OMOP Invest Drug | mapping propagation |
Read | mapping curation |
RxNorm | Refresh (June 2025 version) |
RxNorm Extension | Refresh (July 2025 version) |
CVX | Refresh (July 2025 version) + mapping curation |
NDC | Refresh (July 2025 version) + mapping propagation |
SPL | Refresh (July 2025 version) |
ATC | continuation of improvement work |
LOINC | Refresh (February 2024 version) |
ICD10PCS | mapping curation |
* Route Domain refinement (simplification and relationships) has been selected as a complex contribution for February 2025 release. It had been previously ratified with the community. More information here.
Complex contribution for August 2025 will be prioritized and selected in February 2025. The full list of the current complex contributions (planned or in progress) include:
What | Content |
Route simplification and relationships | Create new simple model for routes, link specific SNOMED routes to the new routes (Theresa Bukard) |
SNOMED Veterinary | Refresh and de-coupling with SNOMED (Julie Green) |
CIEL | Refresh + modifications (Andrew Kanter) |
HemOnc | Oncology vocabulary with mixed domains (Jeremy Warner) |
ICDO3 | Oncology vocabulary refresh (Peter Prinsen) |
Vaccine ontology | Build a new system to classify vaccines in the OHDSI Standardized Vocabularies (Oliver He) |
ORPHANET | New non-standard vocabulary for rare diseases, which can be incorporated through automated procedure using UMLS (Michele Zoch) |
Spanish Drug Agency vocabulary | Incorporation of Spanish drugs into OHDSI Standardized Vocabularies (Juan Ignacio) |
DPD | Canadian drug vocabulary refresh (Mahmoud Azimaee) |
JMDC | Refresh of Japan drug vocabulary refresh (Dmitry Dymshyts) |
In addition to selecting one primary contribution for the release, we will also continuously support stewards with developing complex contributions throughout the year subject to resource availability.
The plan for 2023 Q1 - 2024 Q2 includes the refreshes of the following commonly used vocabularies: SNOMED-CT, ICD family, Read, RxNorm, CVX, LOINC, HCPCS, ICD10PCS,
MedDRA, MeSH, NAACCR, dm+d, as well as improvement activities tailored to the most commonly reported problems described above.
Table 1 outlines the vocabularies included in each release as per the roadmap above.
Table 1. Vocabularies and activities included in each release.
Activity | Vocabulary version and modification | Name |
Spring release, May 2023 | ||
CVX | refresh (20230222 version) and refactored code | Maria, Timur |
dm+d | refresh (20220927 version) and refactored code | Oleg, Timur |
HCPCS | improvement + refresh (Apr 2023 version) | Masha, Timur |
MeSH | refresh (2022 version) and refactored code | Timur |
NAACCR | mapping addition | Vlad, Timur |
NDC | refresh (20230319 version) | Oleg |
RxNorm | refresh (20230306 version) | Oleg, Timur |
RxNorm Extension | refresh (May 2023 version) | Oleg |
Smoking hierarchy | mapping addition | Maria, Timur |
SPL | refresh (20230319 version) | Oleg |
Summer release, August 2023 | ||
CPT4 | refresh (Spring 2023 version) | Masha, Timur |
LOINC | refresh (2.74 version) | Maria, Timur |
NDC | refresh (Aug 2023 version) | Oleg |
RxNorm | refresh (Aug 2023 version) | Oleg, Timur |
RxNorm Extension | refresh (Aug 2023 version) | Oleg |
SPL | refresh (Aug 2023 version) | Oleg |
VANDF | refresh (20230306 version) | Oleg, Varvara, Timur |
Community contribution guidelines (part 1) | coverage of basic use cases | Anna, Alex, Christian, Timur |
Vocabulary Quality System (part 1) | conformance checks publicly available with each release | Alex, Anna, Christian, Timur |
Winter release, February 2024 | ||
CVX | refresh (Summer-Fall 2023 version) | Maria, Timur |
LOINC | refresh (Summer-Fall 2023 version) | Maria, Timur |
Read | mapping refresh | Maria, Irina |
HCPCS | refresh (Oct 2023 version) | Masha, Timur |
ICD10PCS | refresh (2023 version) | Masha, Maria, Timur |
MedDRA | improvement + refresh (version 26, Mar 2023) | Mikita, Timur |
NDC | refresh (Jan 2023 version) | Oleg |
RxNorm | refresh (Dec 2023 version) | Oleg, Timur |
RxNorm Extension | refresh (Feb 2023 version) | Oleg, Timur |
SPL | refresh (Jan 2023 version) | Oleg |
SNOMED overhaul | overhaul | Oleg, Timur |
SNOMED UK | refresh (Spring-Summer 2023 version) | Oleg, Timur |
SNOMED Int | refresh (Spring 2023 version) | |
SNOMED US | refresh (Feb 2023 version) | |
ICD | machinery improvement | Irina, Oleg, Timur |
ICD9(CM) | mapping improvement | Irina, Oleg |
ICD10(CM) | refresh (2022/2023 versions) | |
ICD10 (int) | mapping improvement | |
ICD10CN (China) | mapping improvement | |
ICD10GM (Germ) | refresh (2023 version) | |
CIM10 (France) | refresh (2023 version) | |
Community contribution guidelines (part 2) | coverage of complex use cases | Anna, Alex, Christian, Timur |
Vocabulary Quality System (part 2) | standardized system with more complex assessment | Alex, Anna, Christian, Timur |
Summer release, August 2024 | ||
ATC | overhaul + refresh (2024 version) | Anna, others tbd |
CPT4 | refresh (2024 version) | Masha, Timur |
CVX | refresh (2024 version) | Maria, Timur |
HCPCS | refresh (April 2024 version) | Masha, Timur |
ICD9(CM) | mapping improvement | Maria, Irina |
ICD10(CM) | refresh (2023/2024 versions) | |
ICD10 (int) | mapping improvement | |
ICD10CN (China) | mapping improvement | |
ICD10GM (Germ) | refresh (2023/2024 versions) | |
CIM10 (France) | refresh (2023/2024 versions) | |
LOINC | refresh (2024 version) | Maria, Timur |
MedDRA | refresh (2024 version) | Mikita, Timur |
NDC | refresh (Aug 2024 version) | Oleg |
OMOP Invest Drug | refresh (2024 version) | Oleg, Varvara, Timur |
Read | mapping refresh | Maria |
RxNorm | refresh (Feb 2024 version) | Oleg, Timur |
RxNorm Extension | refresh (Aug 2024 version) | Oleg |
SNOMED Int | refresh (Feb 2024 version) | Masha, Timur |
SNOMED UK | refresh (Apr 2024 version) | |
SNOMED US | refresh (Mar 2024 version) | |
SPL | refresh (Aug 2024 version) | Oleg |
VANDF | refresh (2024 version) | Varvara, Timur |
Vocabulary-specific overhauls and improvements include:
- Stable domain and concept class id assignment.
- Alignment of the validity dates with the source.
- Fix of the problem with replacement relationships (such as “Concept replaced by”) not having “Maps to” links that prevent the users from automatically following the “Maps to” relationships from non-standard to standard counterparts.
- Clean-up of existing legacy “Maps to” relationships originating from “Concept is a possible equivalent to”.
- De-standardize and map the concepts in Drug and other (Race, Provider) domains to the standard concepts so that they can be effectively used in the sources that use SNOMED-CT (such as CPRD).
- Split up the pre-coordinated concepts (such as lab test with the results, allergies to the specific substances) and map them over to the respective concepts.
- Documentation of SNOMED-CT processing, domain assignment and quality assurance.
- Mapping re-use across ICD family to identify the discrepancies and similarities across different versions of ICD and improve the consistency of mappings.
- Incorporation of the mappings provided by SNOMED-CT and other sources.
- Fix of the source (CIAML) file processing to capture the ICD concepts currently missing.
- Documentation of the current procedures for mapping and quality assurance.
- Design and document the model that would allow to use MedDRA as both source and Classification terminology in the Condition Domain.
- Development of system that would allow to re-use the mappings of various sources (MedDRA-SNOMED initiative, UMLS), build our own based on the user needs, annotate them with metadata using SSSOM or other standards, and automatically transform them using generated metadata in both horizontal and hierarchical relationships.
- Build of “Maps to” relationships from MedDRA to SNOMED.
- Build of hierarchical relationships between MedDRA and SNOMED.
- Adopt the data-driven approach of attribute selection (RxNorm and RxNorm Extension attributes for ATC codes) based on the data sources that have ATC codes (Z index, JMDC, others).
- Identification of discrepancies and similarities between code assignment in different data sources to establish more consistent and accurate mappings from ATC to RxNorm (Ext).
- Validation of the vocabulary using data-driven approaches (including currently existing comparison for 1:1 matching to Clinical Drug Form and further expansion to comparison of the assignments for Clinical Drug, Branded Drug and 1:many matching).
- If feasible, incorporation of WHO ATC-drug product links and DDD represented in the machine-readable form.
- Hierarchy review, fix and documentation.
Process improvement activities include:
We divide the guidelines and processes into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.
The first part will handle simple use cases such as changing “Maps to”, changing concept names and domains, adding or deprecating relationships or adding small vocabularies with no internal hierarchy. We will establish the pipeline for incoming requests with clear communication on when they will be incorporated. The pipeline involves submitting a request on GitHub with filled templates that follow stage tables’ structure to facilitate incorporation, instructions on how to fill them and quality assurance checks that need to be performed on the requester side. GitHub requests will facilitate version control and serve for educational purposes for other contributors. We will use existing requests that have not been fulfilled (such as ethnicity codes provided by the Health Equity WG, NIH provider codes and vocabulary, etc.) for dry runs and illustrative purposes.
The second part will target more complex use cases such as adding new vocabularies and changing hierarchies and therefore requires more comprehensive approaches (common development environment, automated scripts for quality assurance, maintenance scripts if applicable) building into a system for community contribution. Potential use cases for dry runs include ICPC2 that consist of adding a vocabulary, new codes and mappings to existing standard concepts.
As we have a standardized system for incorporating drug vocabularies (which, as opposed to other domains, influence standard vocabularies [RxNorm Extension] and therefore require more robust QA), drug vocabularies will be separated into a distinct chapter in the guidelines following the existing guides for contributors.
Community contribution guidelines will also include the guidance and best practices on how to locally add new concepts (in the form of 2 billion codes) and relationships (in the form of source_to_concept_map or concept_relationship) or modify relationships to enable research in those organizations and teams that require such modifications before they are released.
The guidelines and approaches will be shared with the committee and subsequently with the community for feedback.
We to divide the Vocabulary Quality System into two parts with the first part rolled out by August 2023 release and second part rolled out by February 2024.
The first part (quality control) includes describing existing procedures and making the documentation publicly available and adding the reports about passing the conformance checks and descriptive statistics (structure of the vocabularies, mapping coverage, gaps in hierarchies, orphan codes and more) to each release. It also includes expanding the tests to ensure comprehensive coverage based on the previously reported problems.
The second part (quality management system) includes designing a quality system with more complex completeness and plausibility checks and external validation. A systematic approach needs to be developed and the existing practices in other ontologies will be taken into consideration. As there is lack of frameworks (analogous to Kahn’s framework for data quality) for complex systems that harmonize and align multiple ontologies, this part will require more research and collaboration among the experts in the OHDSI community.