Community contribution: non‐drug vocabularies p.I - OHDSI/Vocabulary-v5.0 GitHub Wiki

This document is intended to be used for contributing content to the OHDSI Standardized Vocabularies excluding drug vocabularies.

Authors: Anna Ostropolets, Alexander Davydov, Oleg Zhuk, Vlad Korsik, Timur Vakhitov, Christian Reich

Version: 1.0

Date of last modification: 06/05/2023

Acknowledgement: We want to thank the Vocabulary Committee and community (especially Solmaz Oskoui) for feedback on this document.

Scope and content of this document

Here, we aim to describe how you can get involved in the OHDSI Standardized Vocabularies production (hereon, community contribution).

Community contribution pipeline and guidelines will be rolled out in two parts.

This document (Part I) covers community contribution related to simple use cases such as addition of concepts or modification of relationships and outlines the steps for community contribution at the initial stages of Vocabularies development. Here, you will find the directions on how to submit a request, prepare files, assess their quality, and track the progress of your submission.

It does NOT cover the final stage that must be performed by the Vocabulary Team (final quality assurance [QA] and release), transformation of interim tables into the production-format tables or contribution to Athena source code (to be developed later). Part II (to be developed later) will cover more complex use cases that require more robust quality assurance and/or automated or semi-automated pipelines and include construction of complex hierarchies, dependencies, complex modification of standard vocabularies, etc. Description of the quality assurance procedures performed by the Vocabulary Team and vocabulary-related documentation are not included in this document.

The rest of the document is structured in the following way. 'Prerequisites' section points you to the resources that describe the Vocabularies in more detail and provide an overview of the current process and how community contribution fits in. 'Vocabulary development process' outlines the current process of importing, processing, and distributing vocabularies. It is followed by the description of the types of contribution ('Types of contribution'), with a specific focus on Part I followed by the directions ('Directions'). Additionally, details on how to fill the templates for each type of contribution can be found in the corresponding templates (please see 'Directions' for links).

Prerequisites

If you have not done so, please familiarize yourself with the content and structure of the OHDSI Standardized Vocabularies through the EHDEN Academy course, GitHub Wiki, tutorial or Book of OHDSI.

In brief, OHDSI Vocabularies as of the beginning of 2023 encompass more than 130 vocabularies that are imported, manipulated and released by the OHDSI Vocabulary Team, which is a part of the OHDSI CDM Working Group.

Vocabulary development process

Some of the community contributions require knowledge of the source vocabulary structure and rules and/or practices of integrating source vocabularies into the OHDSI Standardized Vocabularies.

For example, integration of drug vocabularies into the OHDSI Standardized Vocabularies follows a standardized approach of splitting source drugs into their attributes and mapping the attributes to standard counterparts with the subsequent machinery automatically finding the best match for the drugs (more information about the drug vocabulary development will be published later).The complexity of the process and required knowledge varies by domain, vocabulary and intended modification.

The common steps of the vocabulary development process include vocabulary import from an external source, staging, integration, and release. Briefly, once a vocabulary is obtained from an external source (in an automated fashion or transferred manually from a third party), it goes through standardization or "staging". The output of this step is CONCEPT_STAGE, CONCEPT_RELATIONSHIP_STAGE and others, which further goes through a set of standardized processing and quality control procedures to create CONCEPT, CONCEPT_RELATIONSHIP and other tables.

More details on the process can be found here. The community contribution files described in this document are manual stage tables that enter the development process at the staging step and are subsequently integrated into the Vocabularies. Drug vocabularies require different tables, please refer to "Drug vocabulary development" (to be published later).

Types of contributions

In this section, we will describe the use cases covered in this document followed by more detailed instructions.

Part I of community contribution guidelines covers the use cases that can be viewed as one-off contribution as opposed to the those that require regular maintenance. This group of community contributions do not involve significant modification of standard vocabularies in the OHDSI Vocabularies as the latter influences the community and requires robust quality assurance and control. Because of that, such contributions can be submitted as user-friendly files (e.g., excel spreadsheets) and can undergo manual quality control.

The use cases covered in this document can be divided into two distinct activities: adding new content and modifying the existing content. Adding new content includes adding a vocabulary as non-standard with or without mappings to an existing OHDSI standard concepts, adding non-standard concept(s) to the existing vocabulary or new synonyms and adding mappings ("Maps to" relationships) for existing concepts. Adding content does not require Vocabulary Team review and ensuring the quality of submission is the contributor's responsibility. As a tradeoff, additions can be performed seamlessly. Modification of the existing content (changing concept attributes, mappings or promoting non-standard concepts to standard) requires Vocabulary Team review as it involves modifying the content that may be used by other community members.

In this document's scope, if the community wants to add new standard terms, they first need to add them as non-standard and then promote them to standard (please refer to use case #7). Such a process ensures the quality of the content that is used to populate OMOP CDM and, subsequently, in research across the community.

Below you can find each use case that corresponds to the template on Google drive you will need to fill. Table 1 displays the content of each template. Each template has additional details on how to fill the fields.

Table 1. Content of the templates for each use case

# Type Tables to fill Notes
T1 Adding new non-standard concept(s) to an existing vocabulary concept_manual, concept_synonym_manual (if applicable), concept_relationship_manual (if applicable), metadata, checklist If you want to add mappings or synonyms along with a new concept, you can do that in one place. Otherwise leave the corresponding tables blank.
T2 Adding new synonym(s) to an existing concept(s) Concept_synonym_manual, metadata, checklist
T3 Adding a mapping to an existing concept concept_relationship_manual, metadata, checklist This template can be used for adding mappings for non-standard concepts or de-duplication (adding mapping from a standard to another standard concept to de-standardize it)
T4 Adding a new vocabulary as non-standard with mappings (full or partial) to a standard vocabulary concept_manual, concept_relationship_manual (if applicable), concept_synonym_manual (if applicable), vocabulary, metadata, checklist If you want to add mappings or synonyms along with new concepts, you can do that in one place. Otherwise leave the corresponding tables blank.
T5 Modifying attributes of an existing concept(s) concept_manual, metadata, checklist
T6 Modifying mapping for an existing concept concept_relationship_manual, metadata, checklist
T7 Promoting non-standard concepts to standard concept_manual, metadata

Covered use cases

I. Addition of new content

1. Adding new non-standard concept(s) to an existing vocabulary (T1)

Example: you have an NDC term that is not in the OHDSI Standardized Vocabularies.

Input: A term with its attributes (concept_name, concept_code, valid_start_date and valid_end_date, domain_id, concept_class_id, standard_concept)

Output: This will add record(s) to CONCEPT table

Note: we support adding non-standard concepts only. If you believe the concept(s) you want to add should be standard, you will first add them as non-standard (T1) and then will go through the process of promoting them to standard (T7).

2. Adding new synonym(s) to existing concept (T2)

Example: You want to a synonym in French to an existing ICD10CM term.

Input: The synonym(s) and the concept(s) you are adding for which you are adding synonym(s).

Output: This will add record(s) to CONCEPT_SYNONYM table.

3. Adding new mapping(s) to an existing concept (T3)

Example: You want to add mapping for a non-standard HCPCS term to a standard SNOMED term.

Input: Two terms and relationship between them with its attributes (name, validity dates).

Output: This will add record(s) to CONCEPT_RELATIONSHIP table.

Note: This document supports adding 'Maps to' relationships only.

4. Adding a new vocabulary as non-standard with mappings (full or partial) to a standard vocabulary (T4)

Example: You want to add a new vocabulary as a non-standard (cannot be used to populate standard fields in OMOP CDM) with mappings to standard terms in the Vocabularies.

Input: Vocabulary and its attributes (vocabulary_name, vocabulary_reference), its terms with their attributes (concept_name, concept_code, valid_start_date and valid_end_date, domain_id, concept_class_id, standard_concept), standard terms in the OHDSI Vocabularies and 'Maps to' links to them.

Output: Your vocabulary will be added to VOCABULARY table, its terms will be added to CONCEPT table and 'Maps to' relationships will be added to CONCEPT_RELATIONSHIP table.

Note: Here, no internal relationships and hierarchies of the source vocabulary are transferred to the OHDSI Vocabularies. Complex vocabularies and new areas that may require data and ontology conventions (such as imaging, genomics or else) require discussion with the Vocabulary Team.

Vocabulary refresh

If you are planning to refresh the vocabulary in future, the process will look as follows. You will need to identify the changes you want to make (add more content or change something in the previous version of the vocabulary). Depending on what you choose, you will create a new issue with the corresponding templates. For example, if you want to add a new concept, change the mapping for an old concept and promote another concept to Standard, you will fill three templates (T1, T6 and T7).

Note If you want to refresh a vocabulary that is already incorporated into the OHDSI Standardized Vocabularies (available through athena.ohdsi.org), please contact the Vocabulary Team. The process will depend on how the vocabulary was processed initially and may involve either contribution through templates or contribution of code on GitHub.

II. Modification of existing content

Modification of existing content requires additional quality and content control performed by the Vocabulary Team and may require consensus of the OHDSI Community if such a change impacts other researchers or developers.

5. Modifying existing concept(s) (T5)

Example: You want to change a name, domain, or concept class of an existing concept.

Input: An existing term and the new attributes you want to change the old attributes to.

Output: This will modify record(s) in the CONCEPT table.

Note: Deprecation is out of scope of this document. If you believe a concept needs to be deprecated, please submit a GitHub issue.

6. Modifying existing relationship(s) in an existing concept (T6)

Example: You want to change the current mapping of an ICD-10(CM) code to a different one.

Input: The existing relationship you want to deprecate and a new relationship you want to create with its attributes (name, validity dates).

Output: The existing relationship(s) will be deprecated in CONCEPT_RELATIONSHIP table and new one(s) will be created.

7. Promote non-standard terms to standard (T7)

Example: you subsequently choose to promote some of the non-standard terms you added to standard to be used in CDM tables.

Input: The concept(s) you want to promote

Output: This will change record(s) in CONCEPT table.

Note: Promotion should be handled with caution if a concept has relationships to other concepts (such as "Maps to"), which can be deprecated.

Not covered use cases

Other use cases not covered in this document in general can be described as those that require extensive quality control that cannot be performed manually.

They include but not limited to addition of vocabularies that need to be regularly refreshed in an automated fashion or those that have complex structure, refresh of the vocabularies that have complex structure or impact other OHDSI vocabularies, creation of classificational terms on top of SNOMED or changing OHDSI-built content (such as OMOP or RxNorm Extensions).

We invite those who may be interested in contributing to join the Vocabulary WG calls and Vocabulary Team to learn more about the development practices, familiarize themselves with existing code on GitHub and participate in development.

Directions

  1. Open an issue on GitHub.

You will need to choose 'Community contribution - modify existing content' or 'Community contribution - add new content' category. You will be prompted to a new window; the body of the issue will direct you to the templates to fill.

Please open an issue prior to starting your work. The issue should briefly describe proposed changes or additions, rationale behind them, and potential impact on the activities in OHDSI (such as research and ETL). This will help to identify other contributors that may be working on the same content and/or provide suggestions and recommendations at the initial stage of development.

  1. Fill in the corresponding templates.

The GitHub issues have links to the corresponding templates, which can also be found here. Each template has instructions and examples of how to fill in.

Also, each template has a section for metadata about the submission and the checklist (quality checks) that must be filled in before submission.

  1. Submit the filled templates and provide the link to submission in the comments to the issue.

Please create a sub-folder for your submission here (name it as GitHubIssueNumber_ContributionName_YourName) and place the filled submission there. If submission concerns modifying the content, the Vocabulary Team will get back to you. Modify and re-upload the templates if needed.

  1. See your contribution in the Vocabularies release.

The contributions will be incorporated in the next release if they are submitted and approved at least 2 months before the target release date. Adding content requires no review. Modifying content requires review, which depth and length depends on the complexity and volume of modification and may involve additional Vocabulary, CDM or other WG discussions. Community feedback regarding modifications will be collected to ensure that there are no competing perspectives on the contribution and there is a mutual agreement in the community regarding it.

⚠️ **GitHub.com Fallback** ⚠️