Vocab. NAACCR - OHDSI/Vocabulary-v5.0 GitHub Wiki
NAACCR
NAACCR (North American Association of Central Cancer Registries) is a data standard used to code data in the US Cancer Registries. NAACCR is arguably the best existing data dictionary that covers majority of cancer types and includes critical diagnostic features and high level treatment classification used in cancer epidemiology.
Source structure
NAACCR data is not provided in a form of standard ontology. Concepts exist as a list of different cancer-related variables provided with a list of valid values and their codes. Variables themselves are split into schemas representing diagnostically related groups of neoplasms, such as lymphomas or esophageal neoplasms.
Source tables used for ingestion of NAACCR into OMOP CDM are derived using the SEER API.
Internal hierarchy
All NAACCR concepts form an ontology from Schema over Variable to Value level. All relationships are stated explicitly through levels, meaning that Values have relations directly to Schema level. Concepts on Variable level are also united in kind of hierarchy indicated by relationship_id 'Has parent item' and 'Date of variable'. Variables that belong to more than one Schema have stated relations to all of them. Such Variables also don't specify a schema name in their code.
Currently, as a source ontology the hierarchy is not represented in the CONCEPT_ANCESTOR table, but is fully present in CONCEPT_RELATIONSHIP table.
Code format
All NAACCR codes are ontological, meaning they are built by concatenating all preceding ontological levels to capture meaning. Schema codes coincide with schema names, Variables and values are numeric.
Type of concept | concept_code | concept_name |
---|---|---|
Site-specific variable | brain@2900 | Functional Neurologic Status - Karnofsky Performance Scale (KPS) |
Site-nonspecific variable | 2810 | CS Extension |
Site-specific value | colon@2810@050 | (Adeno)carcinoma, noninvasive, in a polyp or adenoma |
Site-nonspecific value | 1004@99 | [TNM Clinical Stage Group] Unknown, not staged |
Site-specific value code contains parent schema name (colon), variable code (2810) and proper value code (050). For non-specific values and variables, schema name is omitted.
//Note: site-specific values may belong to site-nonspecific variables, as this is a case in this example//
Concept classes
Schema level | Description |
---|---|
NAACCR Schema | Top level of hierarchy, grouper concepts |
NAACCR Proc Schema | Schemas exclusively containing medical and surgical procedures related to cancers |
Variable level | |
NAACCR Variable | Variables belonging to schemas |
Value level | |
NAACCR Value | Concepts representing permissible values for most variables |
NAACCR Procedure | Medical procedures belonging to specific schemas |
Permisssible Range | Concepts representing allowed numeric ranges for variables. Numeric values outside specified range must be treated as specific codes or conversion artifacts. See "3. Populate Modifier record in Measurement for values as numbers" proposal for details |
Domains by class
NAACCR concepts belong to different domains depending on their clinical meaning.
Schema level | Possible DOMAIN_ID | Description |
---|---|---|
NAACCR Schema | Observation | Hierarchical level, so concepts are non-specific groupers |
NAACCR Proc Schema | ::: | ::: |
Variable level | ||
NAACCR Variable | Measurement, Observation, Metadata, Episode | Various domains depending on clinical meaning |
Value level | ||
NAACCR Value | Meas Value, Procedure, Observation, Drug | Concepts representing permissible values for most variables. Meas value is the most common, other domains are chosen depending on parent variable domain |
NAACCR Procedure | Procedure, Observation | Procedure domain is default. Observation domain is for concepts indicating special procedure context (e.g. procedure not performed) |
Permisssible Range | Meas Value | Currently, numeric concepts don't have a dedicated domain |
Standard status and mapping, by class
NAACCR concepts currently do not have active mappings to concepts from other vocabularies, excluding some NAACCR variables which are mapped to Standard concepts in the Episode vocabulary.
Schema level | Description |
---|---|
NAACCR Schema | Non-standard without mapping |
NAACCR Proc Schema | ::: |
Variable level | |
NAACCR Variable | Standard and non-standard concepts; non-standard concepts may have mapping to standard concepts |
Value level | |
NAACCR Value | Standard and non-standard unmapped concepts |
NAACCR Procedure | Standard concepts, always map to self |
Permisssible Range | Non-standard without mapping |
External relations
NAACCR Schema concepts have specific relations to precoordinated standard ICDO Condition concepts from ICDO3 concepts sourced from SEER. These relations indicate relation between neoplasm diagnoses and NAACCR schemas containing variables supporting further detalization of diagnoses and treatment history. They are intended to be used in extended ETL logic.
Source vocabulary | Source concept_class_id | relationship_id | Target vocabulary | Target concept_class_id |
---|---|---|---|---|
NAACCR | NAACCR Schema | Schema to ICDO | ICDO3 | ICDO3 Condition |
NAACCR | NAACCR Proc Schema | Proc Schema to ICDO | ICDO3 | ICDO3 Condition |
Reverse relations | ||||
ICDO3 | ICDO Condition | ICDO to Schema | NAACCR | NAACCR Schema |
ICDO3 | ICDO Condition | ICDO to Proc Schema | NAACCR | NAACCR Proc Schema |
=== External links ===
- NAACCR home site
- SEER API used to derive source tables
- Documentation regarding joint NAACCR and ICDO3 implementation
- Specifications for upcoming Episode table, which meant to harbor some of NAACCR variable concepts
- NAACCR dicitonary browser
- Detailed ETL instructions on Oncology Workgroup Wiki
- Illustrated crosswalks for ETL process