Vocab. ATC - OHDSI/Vocabulary-v5.0 GitHub Wiki

Authors: Alexander Davydov, Anna Ostropolets, Oleg Zhuk, Anton Tatur, Vlad Korsik, Polina Talapova, Christian Reich.

ATC overview


In OHDSI Standardized Vocabularies, the Anatomical Therapeutic Chemical (ATC) Classification System is used as the basis of the Drug hierarchy. The vocabulary itself is maintained by the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC) and is built around the organ or system the drugs target and their therapeutic, pharmacological, and chemical characteristics. ATC codes function as semantic identifiers. RxNorm and RxNorm Extensions are hierarchical descendants of the ATC concepts.

ATC use cases

ATC can be used to support mapping of source drugs into OHDSI Vocabularies standard counterparts or to retrieve standard drugs within drug hierarchy.

Using ATC for ETL

All ATC concepts are either "Classification" (Valid concepts) or "Non-standard" (Invalid concepts). You can use “Maps to” links to get RxNorm ingredients if you have ATC codes in your source. Some of the ATC 5th level codes have multiple ingredients. Levels beyond ATC 5th do not have links to ingredients.

Using ATC for retrieving drugs within a group

An example can be retrieving all drugs that belong to glucocorticosteroids and have systemic route (oral or injectable but not topical). . To do that you’d first specify a list of ATC codes:

concept_id concept_code concept_class concept_name
21602723 H02A ATC 3rd CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN
21602745 H02B ATC 3rd CORTICOSTEROIDS FOR SYSTEMIC USE, COMBINATIONS

In the hierarchy, ATC codes are positioned above the RxNorm and RxNorm Extensions codes. They are considered ancestors, while RxNorm / RxNorm Extensions are descendants.

You’d then select all descendants of these two codes in Atlas or query concept_ancestor where 2 ATC codes are ancestor_concept_id. Note that the resulting set will also contain ingredients, which you will need to exclude if you are looking for drugs with systemic route only (ingredient does not have the route specified).

Use case 2: to get all ingredients for an ATC code

ATC codes are directly mapped to Ingredients. , there are four types of relationships between ATC codes and ingredients: ATC - RxNorm pr lat, ATC - RxNorm sec lat, ATC - RxNorm pr up, and ATC - RxNorm sec up (explained below).

concept_id concept_code concept_class concept_name
21600104 A02BD03 ATC 5th lansoprazole, amoxicillin and metronidazole; systemic

ATC structure

Concept Code and Concept Name

ATC codes are alphanumeric and vary by classification level:

  • 1st Class: One letter (e.g., A).
  • 2nd Class: One letter + two digits (e.g., A01).
  • 3rd Class: One letter + three digits (e.g., A01A).
  • 4th Class: Two letters + three digits (e.g., A01AA).
  • 5th Class: Full code for specific ingredients (e.g., A01AA01 for Sodium fluoride).

Each level adds more specificity to the drug's classification.

The concept names follow this format:

  • 1st-3rd Classes: Uppercase.

  • 4th Class: Initcap.

  • 5th Class: Lowercase, includes active ingredient(s) or group(s) and route of administration, separated by a semicolon “;”. Multiple routes for one product are separated by commas.

    Example:

concept_id concept_code class concept_name
21605007 R ATC 1st RESPIRATORY SYSTEM
21603248 R03 ATC 2nd DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES
21603327 R03D ATC 3rd OTHER SYSTEMIC DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES
21603328 R03DA ATC 4th Xanthines
21603333 R03DA05 ATC 5th aminophylline; systemic, rectal
  • Some 5th Classes ('combinations' or 'various' classes), include their 4th-level class name first. Deprecated or updated codes are marked with “[D]” or “[U]”.

OMOP Conversion of ATC Administration Routes

Routes in ATC are converted into OMOPized routes, with key changes:

  • "O" (oral) is split into “oral” (enteral) and “local oral” (e.g., dental products).
  • Combination of oral and parenteral routes is classified as "systemic".
  • Different inhalation forms are combined into a single “inhalation” form.

Concept Status (Standardness)

Valid ATC codes are classified as "Classification", while invalid ATC codes are considered "Non-standard".

Domains

All ATC concepts belong to the "Drug" domain.

Concept Relationships

Relationships exist within ATC and with a lot of other vocabularies. Only relationships between ATC and RxNorm (Maps to, ATC - RxNorm, ATC - RxNorm primary lateral/secondary lateral/primary uphill/secondary uphill)) are actively curated, while others are legacy. For more details see Main ATC-specific relationships.

Source codes

Codes are obtained from the current WHOCC ATC/DDD Index, processed and then handled according to the instructions that can be found in the ATC github folder. DDD alterations and ATC code changes are automatically processed through our ATC web scraper. Codes that are discontinued without a replacement are labeled as "D" (deprecated), while those with a replacement code are labeled as "U" (upgraded).

Transformation from source vocabulary into the OHDSI Standardized Vocabularies

The code for ATC integration into the OMOP Standard Vocabularies can be found on the OHDSI GitHub. In the August 2024 release, we changed the ATC - RxNorm link algorithm. Previous knowledge engineering included deconstruction of the ATC concepts to attributes (ingredient and route of administration), matched them with the RxNorm attributes, such as Dose form and Ingredient. More information can be found here. In August 2024, we started building ATC - RxNorm relationships based on external sources through a data-driven approach.

Step 1. Data collection

We developed advanced SQL scripts to efficiently gather and integrate data from various authoritative sources.The list of used external data sources:

Source Description
UMLS (link, link) UMLS (Unified Medical Language System):

A comprehensive set of biomedical terms and concepts maintained by the U.S. National Library of Medicine, designed to integrate multiple health and biomedical vocabularies and standards.

DMD (link, link) DMD (Dictionary of Medicines and Devices - UK):

A standardized database of medicines and medical devices used in the UK, providing consistent information for electronic prescribing and healthcare systems.

GRR (link, link) GRR (Global Reference Repository - IQVIA):

A global drug database developed by IQVIA, providing standardized, structured information on pharmaceutical products for regulatory and commercial use worldwide.

VANDF (link, link) VANDF (Veterans Affairs National Drug File):

A drug terminology system used by the U.S. Department of Veterans Affairs for managing medication information, including drug names, classes, and interactions.

BDPM (link, link) BDPM (Base de Données Publique des Médicaments - France):

The public drug database of France, providing official information about medicines authorized for use in the country, including indications, dosages, and safety information.

Z-Index (link) Z-Index maintains and updates the G-Standaard, the Dutch drug database which is used by all parties in healthcare in the Netherlands. The G-Standaard contains all the products that are dispensed by or used in the pharmacy. Proprietary data.
Norway drug bank (link) Norway Drug Bank (Legemiddelverket):

A national database maintained by the Norwegian Medicines Agency, containing detailed information about approved medicines in Norway.

JMDC (link, link) JMDC (Japan Medical Data Center):

A medical data provider in Japan offering a wide range of healthcare-related data, including drug databases used for research, analysis, and insurance claims.

KDC (link) KDC (Korean Drug Code):

A drug coding system managed by the Ministry of Food and Drug Safety (MFDS) in Korea, providing standardized information on pharmaceutical products in Korea.

Step 2. Data harmonization

External sources provide relationships between ATC concepts and Drugs at different levels. In some cases, they classify Branded Drugs (concepts with ingredient, brand, dose form, and drug strength), in others - Clinical Drugs (concepts with ingredient, dose form, and drug strength), Clinical Drug Forms (Ingredient and dose form), or even simple Ingredients.

To ensure maximum coverage, we used only mappings at the level of Clinical Drug Form or below from these sources. After extracting the relationships to RxNorm concepts at these levels, we leveraged the concept_ancestor table to roll up mappings from classes below Clinical Drug Forms back to Clinical Drug Forms.

For example, if the source provides the mapping for H02AB06 prednisolone to an RxNorm Branded Drug such as prednisolone 25 MG/ML Injectable Solution [Decortin], we then transition upwards in the hierarchy to the corresponding RxNorm Clinical Drug Form like prednisolone Injectable Solution.

This approach allows us to generalize mappings to standardized forms without losing essential ingredient and dose form information.

Step 3. Expansion of forms

We used Dose Form Groups to add new ATC - RxNorm relationships.

Dose Form Groups are internal RxNorm classifiers to unite drugs with related dose forms (eg. Oral Tablets and Oral Capsules). We used them to construct links between ATC codes and drugs in oral capsules if the relationship between ATC and drugs in oral tablets exist.

Step 4. Data cleaning

Several validation methods were applied to verify the results.

  1. New relationships were compared with the existing pool of relationships. The delta between the two sets has been reviewed semi-automatically.
  2. ATC concept names were compared with related drug concept sets to check ingredient matching and matching of routes of administration with drug forms (eg. Oral in ATC vs Oral Tablet in RxNorm).
  3. The number of ingredients in ATC and RxNorm concept names was compared to each other to exclude erroneous assignment of multicomponent ATC codes to mono-ingredient drugs and vice versa.

Step 5. Building new ancestry

The final step of the data processing is cleaning the relationships for the concept_ancestor table. It has been done with the help of a ranking system.

As ATC does not provide unambiguous classification for every drug on the global market, the ATC 5th - RxNorm drug relationships were ranked to further filter the links. For each RxNorm/RxE drug, only relationships with the highest rank ATC codes were included in concept_ancestor table.

The ranking was arranged as follows:

  1. Manual links (COVID-19 vaccines, other vaccines, insulins),
  2. Links between mono-ingredient RxNorm/RxE drugs and ATC codes that have one ingredient in their name,
  3. Links between two-ingredient RxNorm/RxE drugs and ATC codes that have two ingredients in their name,
  4. Links between 3/4-ingredient RxNorm/RxE drugs and ATC codes that have 3/4 ingredients in their name,
  5. Links between RxNorm/RxE drugs and ATC codes formulated as “combination of”
  6. Links between RxNorm/RxE combo-drugs and ATC codes formulated as “Ingredient A + group B” (such as C07BB04 acebutolol and thiazides; systemic)
  7. Everything else.

For example, the links from the sources for RxNorm drug acebutolol / hydrochlorothiazide Oral Tablet were ranked as follows :

atc_code atc_name rx_id rx_name order
C07BB04 acebutolol and thiazides; systemic 40003424 acebutolol / hydrochlorothiazide Oral Tablet 6
C07AB04 acebutolol; systemic 40003424 acebutolol / hydrochlorothiazide Oral Tablet 7
C03AX01 hydrochlorothiazide, combinations; oral 40003424 acebutolol / hydrochlorothiazide Oral Tablet 7
C03AA03 hydrochlorothiazide; oral 40003424 acebutolol / hydrochlorothiazide Oral Tablet 7

The first link was ranked the highest as this drug can be most precisely classified under “acebutolol and thiazides; systemic”. Only this link was used as hierarchical in concept_ancestor.

NB! Since the relationships for ATC codes have been removed only from the concept_ancestor table, they are still available in the concept_relationship table. Therefore, the number of links in these two tables will not match. It is a design choice to provide both specificity (concept_ancestor) and sensitivity (concept_relationships).

Other information

Main ATC-specific relationships

relationship_id reverse target vocabulary_id description
Maps to Mapped from RxNorm | RxNorm Extension
  • Connects an ATC Class and its Standard semantic equivalent in the form of an active ingredient in order to simplify ETL work).
  • Non-hierarchical
  • Depends on the existence of ‘ATC - RxNorm pr lat' and ‘ATC - RxNorm sec lat’’relationship_ids
  • Can be assigned to 5th ATC class only
ATC - RxNorm RxNorm - ATC RxNorm | RxNorm Extension
  • connects ATC Classes and Drug Products, but not Ingredients
  • technically non-hierarchical, however, semantically resembles 'Subsumes' - 'Is a’
ATC - RxNorm pr lat RxNorm - ATC pr lat RxNorm | RxNorm Extension
  • Connects:
    • Monocomponent ATC Class with its full semantic equivalent in the form of the main active Standard Ingredient
    • the 1st active ingredient of the ATC Polycomponent ATC Class with Standard Ingredient.
  • Semantically, it is an equivalent to ‘Maps to’ relationship_id, and usually (except for exemptions) has such a pair in concept_relationship_stage
  • Non-hierarchical
  • The presence of this link is a prerequisite for creating a ‘Maps to’ relationship_id.
  • Can be used for a group of similar ingredients (e.g. senna glycosides, estrogens)
ATC - RxNorm sec lat RxNorm - ATC sec lat RxNorm | RxNorm Extension
  • Connects the 2nd active ingredient OR any subsequent active ingredient of a Multicomponent ATC Class to Rx/RxE Ingredient.
  • Non-hierarchical
  • The presence of this link is a prerequisite for creating a ‘Maps to’ relationship_id.
ATC - RxNorm pr up RxNorm - ATC pr up RxNorm | RxNorm Extension
  • Connects the 1st Ingredient Group of a Multicomponent ATC Class to respective Rx/RxE Ingredients.
  • Applied to a Multicomponent ATC Class, whose name:
    • starts either from an Ingredient Group or the word of ‘^combinations’
    • Contains only one Ingredient Group, which is mentioned somewhere in the middle
  • Hierarchical (applied at the post-processing, gives +1 level of separation)
  • Can be used for well-defined but diverse group of ingredients (e.g. diastase, tree pollen, feather)
ATC - RxNorm sec up RxNorm - ATC sec up RxNorm | RxNorm Extension
  • Connects the 2nd Ingredient Group OR any subsequent Ingredient Group OR NOT mentioned and unspecified list of Ingredients which can be in combination with Primary lateral or Primary upward ATC Classes to respective Rx/RxN Ingredients.
  • Hierarchical (applied at the post-processing, gives +1 level of separation)


Possible combinations of ATC-specific relationships used for Ingredients:

  1. ATC - RxNorm pr lat + ATC - RxNorm sec lat
  2. ATC - RxNorm pr lat + ATC - RxNorm sec up
  3. ATC - RxNorm pr up + ATC - RxNorm sec up
  4. ATC - RxNorm pr up + ATC - RxNorm sec lat

Other Relationships

relationship_id reverse target

vocabulary_id

description
Drug class of drug

(reverse)

Drug has drug class

(initial)

Multilex | RxNorm | RxNorm Extension
  • Connects ATC and RxN/RxE Drug Products which are mostly represented with Marketed Products
  • Reverse of initial relationship_id of 'Drug has drug class’
  • Legacy relationships - are not supported
Is a Subsumes ATC
  • Connects a lower ATC Class and higher ATC Class
ATC - SNOMED eq

(reverse)

SNOMED - ATC eq

(initial)

SNOMED
  • Connects ATC Classes and SNOMED Pharma or Biological Products, Physical Objects and Substances
  • Non-hierarchical
  • Is a crosslink between SNOMED and ATC Drug Classes.
  • Legacy relationships - are not supported
ATC to NDFRT eq

(reverse)

NDFRT to ATC eq (initial) NDFRT
  • Is a cross-link between ATC Classes and National Drug File - Reference Terminology.
  • Non-hierarchical
  • Legacy relationships - are not supported
ATC to VA Class eq

(reverse)

VA Class to ATC eq (initial) VA Class
  • Non-hierarchical
  • Is a cross-link between ATC and Veteran Affairs drug classification.
  • Legacy relationships - are not supported
⚠️ **GitHub.com Fallback** ⚠️