CDISC - OHDSI/Vocabulary-v5.0 GitHub Wiki

CDISC

Overview

The Clinical Data Interchange Standards Consortium (CDISC) is an open, non-profit organization that develops and supports global data standards to improve the quality and interoperability of data in medical research and healthcare. CDISC partners with NCI Enterprise Vocabulary Services (EVS) to develop and support controlled terminology for all CDISC standards initiatives.

Currently, only the CDISC Controlled Terminology supporting Study Data Tabulation Model (SDTM) is represented as OMOP vocabulary. NCI maintains it as part of the NCI-metathesaurus.

Sources

All the CDISC sources were obtained from NCI-metathesaurus.

Information about concepts can be found in the mrconso table.

Table 1 – Fields from NCIm mrconso table used for CDISC integration as OMOP vocabulary

mrconso field name mrconso field definition The way it was used during the CDISC OMOP conversion
cui Unique concept identifier Used during mappings integration.

Can potentially be used to obtain concept hierarchy

scui The source asserted concept identifier Represented as concept_code
code Most useful source asserted identifier (if the source vocabulary has more than one identifier), or a Metathesaurus-generated source entry identifier (if the source vocabulary has none) Used to identify the second part of the concept_code, which is represented in the ‘str’ field
str All set of concept names within one scui Used to obtain concept_names, second parts of concept_codes, synonyms

Transformation

The procedures for transforming Concepts from the source to the OMOP Standard Vocabularies can be found on the OHDSI GitHub.

Concept Names

Single concept’s characteristics (concept_code, concept_name, concept_synonym) are represented within one scui in the mrconso table. According to the prioritization rules, concept_name was chosen from those defined in the ‘str’ field. All other ‘str’ values within the same scui having sab = ‘CDISC’, were taken as synonyms.

Rules to define a concept_name:

  1. When code like ‘%CD’ exists – take ‘str’ with code without ‘%CD’ as a name (Table 2)
  2. When no code like ‘%CD’ is present – take ‘str’ where ‘mrconso.tty’ =’ PT’ and ‘mrconso.sab’ = ‘CDISC’ (not NCI) as a concept_name (Table 3)
  3. When no ‘PT’ with sab = ‘CDISC’ exists – take ‘str’ with tty = ‘SY’ and ispref = ‘Y’ (Table 4)

Concept Code

In most cases ‘scui’ was taken as concept_code. In some cases (where code like ‘%CD’ ‘code’ like ‘%CD’ exists) concept_code is complex and is represented as ‘scui’ concatenated with code as ‘code’ LIKE ‘%CD’ is considered to be the real source code, however, not all the concepts have it.

See examples below.

Table 2 – code like ‘%CD’ exists

mrconso fields CDISC OMOP attribute
scui tty sab code str
C198232 SY CDISC C198232 Air Pressure concept_synonym
C198232 PT NCI C198232 Air Pressure -
C198232 PT CDISC SDTM-AUTEST Air Pressure concept_name
C198232 PT CDISC SDTM-AUTESTCD AIRPRSSR 2nd part of the concept_code

In the example above, AIRPRSSR was assigned as the second part of the concept_code (C198232-AIRPRSSR), and Air Pressure – as the concept_name. Other str, except that for NCI, are synonyms.

Table 3 – code like ‘%CD’ does not exist

mrconso fields CDISC OMOP attribute
scui tty sab code str
C40407 SY CDISC C40407 Wilms Tumor of the Kidney concept_synonym
C40407 SY CDISC C40407 Wilms' Tumor of the Kidney concept_synonym
C40407 SY CDISC C40407 Embryonal Nephroma concept_name
C40407 SY CDISC C40407 Nephroblastoma concept_synonym
C40407 SY CDISC C40407 Renal Wilms' Tumor concept_synonym
C40407 PT CDISC C40407 NEPHROBLASTOMA, MALIGNANT concept_name
C40407 PT NCI C40407 Kidney Wilms Tumor concept_synonym

In the example above, C40407 was assigned as the concept_code, and NEPHROBLASTOMA, MALIGNANT – as concept_name. Other str, except that for NCI, are synonyms.

Table 4 – no ‘PT’ with sab = ‘CDISC’ exists

mrconso fields CDISC OMOP attribute
scui tty ispref sab code str
C62017 SY Y CDISC C62017 Type 1 2nd degree AV Block concept_name
C62017 PT Y NCI C62017 AV Block Second Degree Mobitz Type I -
C62017 PT Y NCI C62017 Mobitz Type I Second Degree AV Block -
C62017 PT Y NCI C62017 AV Block Second Degree Möbitz Type I -

In the example above, C62017 was assigned as the concept_code, and Type 1 2nd degree AV Block, – as concept_name.

Standard Concepts

All the CDISC concepts are not standard.

Domains and Concept Classes

Concept_class and domain_id will be obtained through mapping of Attributes from NCI mrsty table to SNOMED concept_classes and domains. In cases when one concept has several (2 or more) attributes and as a result several classes and/or domains – default class and domain are assigned (concept_class_id = ‘Observable Entity’, domain_id = ‘Observation’).

Mappings of CDISC attributes to SNOMED concept classes and domains are available from here.

alt_text

domain_id count
Observation 16897
Measurement 8158
Spec Anatomic Site 1792
Procedure 620
Unit 571
Condition 455
Geography 276
Device 208
Provider 41

Concept Relationships

Lateral relationships

The only type of relationship introduced at the time of vocabulary integration was mapping ones: ‘Maps to’ and ‘Maps to value’. The majority of such relationships are uncurated, source-derived mappings. However, there is a portion of pre-selected codes curated manually.

More mapping information (i.e. provenance and directionality) can be found in the concept_relationship_metadata table. No other lateral (intra-vocabulary) relationships were introduced.

Hierarchy

CDISC Concepts are non-Standard Concepts and therefore do not participate in the hierarchy of the CONCEPT_ANCESTOR table. No other hierarchical (intra-vocabulary) relationships were introduced.

Instructions for ETL

All the CDISC concepts are non-Standard. That means they have to be mapped to the corresponding Standard Concepts using the CONCEPT_RELATIONSHIP table ("Maps to" and occasionally "Maps to value" records). Most of them are mapped to single Concepts, generating one-to-one records, but some of them create multiple records or have mappings to other domains.

From the ETL perspective it is necessary to use sequential joins to obtain proper mappings:

For cases when you work with CDISC names fields in your data, the querying scenarios are

  • JOIN to concept_name or JOIN to concept_synonym

For cases when you work with CDISC codes fields in your data, the querying scenarios are:

  • JOIN to the second part of the concept_code or JOIN to the first part of the concept_code or JOIN to concept_name or JOIN to concept_synonym
⚠️ **GitHub.com Fallback** ⚠️