Vocab. ATC - OHDSI/Vocabulary-v5.0 GitHub Wiki
Authors: Alexander Davydov, Anna Ostropolets, Oleg Zhuk, Anton Tatur, Vlad Korsik, Polina Talapova, Christian Reich.
In OHDSI Standardized Vocabularies, the Anatomical Therapeutic Chemical (ATC) Classification System is used as the basis of the Drug hierarchy. The vocabulary itself is maintained by the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC) and is built around the organ or system the drugs target and their therapeutic, pharmacological, and chemical characteristics. ATC codes function as semantic identifiers. RxNorm and RxNorm Extensions are hierarchical descendants of the ATC concepts.
ATC can be used to support mapping of source drugs into OHDSI Vocabularies standard counterparts or to retrieve standard drugs within drug hierarchy.
All ATC concepts are either "Classification" (Valid concepts) or "Non-standard" (Invalid concepts). You can use “Maps to” links to get RxNorm ingredients if you have ATC codes in your source. Some of the ATC 5th level codes have multiple ingredients. Levels beyond ATC 5th do not have links to ingredients.
An example can be retrieving all drugs that belong to glucocorticosteroids and have systemic route (oral or injectable but not topical). . To do that you’d first specify a list of ATC codes:
concept_id | concept_code | concept_class | concept_name |
21602723 | H02A | ATC 3rd | CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN |
21602745 | H02B | ATC 3rd | CORTICOSTEROIDS FOR SYSTEMIC USE, COMBINATIONS |
In the hierarchy, ATC codes are positioned above the RxNorm and RxNorm Extensions codes. They are considered ancestors, while RxNorm / RxNorm Extensions are descendants.
You’d then select all descendants of these two codes in Atlas or query concept_ancestor where 2 ATC codes are ancestor_concept_id. Note that the resulting set will also contain ingredients, which you will need to exclude if you are looking for drugs with systemic route only (ingredient does not have the route specified).
Use case 2: to get all ingredients for an ATC code
ATC codes are directly mapped to Ingredients. , there are four types of relationships between ATC codes and ingredients: ATC - RxNorm pr lat, ATC - RxNorm sec lat, ATC - RxNorm pr up, and ATC - RxNorm sec up (explained below).
concept_id | concept_code | concept_class | concept_name |
21600104 | A02BD03 | ATC 5th | lansoprazole, amoxicillin and metronidazole; systemic |
ATC codes are alphanumeric and vary by classification level:
- 1st Class: One letter (e.g., A).
- 2nd Class: One letter + two digits (e.g., A01).
- 3rd Class: One letter + three digits (e.g., A01A).
- 4th Class: Two letters + three digits (e.g., A01AA).
- 5th Class: Full code for specific ingredients (e.g., A01AA01 for Sodium fluoride).
Each level adds more specificity to the drug's classification.
The concept names follow this format:
-
1st-3rd Classes: Uppercase.
-
4th Class: Initcap.
-
5th Class: Lowercase, includes active ingredient(s) or group(s) and route of administration, separated by a semicolon “;”. Multiple routes for one product are separated by commas.
Example:
concept_id | concept_code | class | concept_name |
21605007 | R | ATC 1st | RESPIRATORY SYSTEM |
21603248 | R03 | ATC 2nd | DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES |
21603327 | R03D | ATC 3rd | OTHER SYSTEMIC DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES |
21603328 | R03DA | ATC 4th | Xanthines |
21603333 | R03DA05 | ATC 5th | aminophylline; systemic, rectal |
- Some 5th Classes ('combinations' or 'various' classes), include their 4th-level class name first. Deprecated or updated codes are marked with “[D]” or “[U]”.
Routes in ATC are converted into OMOPized routes, with key changes:
- "O" (oral) is split into “oral” (enteral) and “local oral” (e.g., dental products).
- Combination of oral and parenteral routes is classified as "systemic".
- Different inhalation forms are combined into a single “inhalation” form.
Valid ATC codes are classified as "Classification", while invalid ATC codes are considered "Non-standard".
All ATC concepts belong to the "Drug" domain.
Relationships exist within ATC and with a lot of other vocabularies. Only relationships between ATC and RxNorm (Maps to, ATC - RxNorm, ATC - RxNorm primary lateral/secondary lateral/primary uphill/secondary uphill)) are actively curated, while others are legacy. For more details see Main ATC-specific relationships.
Codes are obtained from the current WHOCC ATC/DDD Index, processed and then handled according to the instructions that can be found in the ATC github folder. DDD alterations and ATC code changes are automatically processed through our ATC web scraper. Codes that are discontinued without a replacement are labeled as "D" (deprecated), while those with a replacement code are labeled as "U" (upgraded).
The code for ATC integration into the OMOP Standard Vocabularies can be found on the OHDSI GitHub. In the August 2024 release, we changed the ATC - RxNorm link algorithm. Previous knowledge engineering included deconstruction of the ATC concepts to attributes (ingredient and route of administration), matched them with the RxNorm attributes, such as Dose form and Ingredient. More information can be found here. In August 2024, we started building ATC - RxNorm relationships based on external sources through a data-driven approach.
We developed advanced SQL scripts to efficiently gather and integrate data from various authoritative sources.The list of used external data sources:
Source | Description |
UMLS (link, link) |
UMLS (Unified Medical Language System):
A comprehensive set of biomedical terms and concepts maintained by the U.S. National Library of Medicine, designed to integrate multiple health and biomedical vocabularies and standards. |
DMD (link, link) |
DMD (Dictionary of Medicines and Devices - UK):
A standardized database of medicines and medical devices used in the UK, providing consistent information for electronic prescribing and healthcare systems. |
GRR (link, link) |
GRR (Global Reference Repository - IQVIA):
A global drug database developed by IQVIA, providing standardized, structured information on pharmaceutical products for regulatory and commercial use worldwide. |
VANDF (link, link) |
VANDF (Veterans Affairs National Drug File):
A drug terminology system used by the U.S. Department of Veterans Affairs for managing medication information, including drug names, classes, and interactions. |
BDPM (link, link) |
BDPM (Base de Données Publique des Médicaments - France):
The public drug database of France, providing official information about medicines authorized for use in the country, including indications, dosages, and safety information. |
Z-Index (link) | Z-Index maintains and updates the G-Standaard, the Dutch drug database which is used by all parties in healthcare in the Netherlands. The G-Standaard contains all the products that are dispensed by or used in the pharmacy. Proprietary data. |
Norway drug bank (link) |
Norway Drug Bank (Legemiddelverket):
A national database maintained by the Norwegian Medicines Agency, containing detailed information about approved medicines in Norway. |
JMDC (link, link) |
JMDC (Japan Medical Data Center):
A medical data provider in Japan offering a wide range of healthcare-related data, including drug databases used for research, analysis, and insurance claims. |
KDC (link) |
KDC (Korean Drug Code):
A drug coding system managed by the Ministry of Food and Drug Safety (MFDS) in Korea, providing standardized information on pharmaceutical products in Korea. |
External sources provide relationships between ATC concepts and Drugs at different levels. In some cases, they classify Branded Drugs (concepts with ingredient, brand, dose form, and drug strength), in others - Clinical Drugs (concepts with ingredient, dose form, and drug strength), Clinical Drug Forms (Ingredient and dose form), or even simple Ingredients.
To ensure maximum coverage, we used only mappings at the level of Clinical Drug Form or below from these sources. After extracting the relationships to RxNorm concepts at these levels, we leveraged the concept_ancestor table to roll up mappings from classes below Clinical Drug Forms back to Clinical Drug Forms.
For example, if the source provides the mapping for H02AB06 prednisolone to an RxNorm Branded Drug such as prednisolone 25 MG/ML Injectable Solution [Decortin], we then transition upwards in the hierarchy to the corresponding RxNorm Clinical Drug Form like prednisolone Injectable Solution.
This approach allows us to generalize mappings to standardized forms without losing essential ingredient and dose form information.
We used Dose Form Groups to add new ATC - RxNorm relationships.
Dose Form Groups are internal RxNorm classifiers to unite drugs with related dose forms (eg. Oral Tablets and Oral Capsules). We used them to construct links between ATC codes and drugs in oral capsules if the relationship between ATC and drugs in oral tablets exist.
- New relationships were compared with the existing pool of relationships. The delta between the two sets has been reviewed semi-automatically.
- ATC concept names were compared with related drug concept sets to check ingredient matching and matching of routes of administration with drug forms (eg. Oral in ATC vs Oral Tablet in RxNorm).
- The number of ingredients in ATC and RxNorm concept names was compared to each other to exclude erroneous assignment of multicomponent ATC codes to mono-ingredient drugs and vice versa.
The final step of the data processing is cleaning the relationships for the concept_ancestor table. It has been done with the help of a ranking system.
As ATC does not provide unambiguous classification for every drug on the global market, the ATC 5th - RxNorm drug relationships were ranked to further filter the links. For each RxNorm/RxE drug, only relationships with the highest rank ATC codes were included in concept_ancestor table.
The ranking was arranged as follows:
- Manual links (COVID-19 vaccines, other vaccines, insulins),
- Links between mono-ingredient RxNorm/RxE drugs and ATC codes that have one ingredient in their name,
- Links between two-ingredient RxNorm/RxE drugs and ATC codes that have two ingredients in their name,
- Links between 3/4-ingredient RxNorm/RxE drugs and ATC codes that have 3/4 ingredients in their name,
- Links between RxNorm/RxE drugs and ATC codes formulated as “combination of”
- Links between RxNorm/RxE combo-drugs and ATC codes formulated as “Ingredient A + group B” (such as C07BB04 acebutolol and thiazides; systemic)
- Everything else.
For example, the links from the sources for RxNorm drug acebutolol / hydrochlorothiazide Oral Tablet were ranked as follows :
atc_code | atc_name | rx_id | rx_name | order |
C07BB04 | acebutolol and thiazides; systemic | 40003424 | acebutolol / hydrochlorothiazide Oral Tablet | 6 |
C07AB04 | acebutolol; systemic | 40003424 | acebutolol / hydrochlorothiazide Oral Tablet | 7 |
C03AX01 | hydrochlorothiazide, combinations; oral | 40003424 | acebutolol / hydrochlorothiazide Oral Tablet | 7 |
C03AA03 | hydrochlorothiazide; oral | 40003424 | acebutolol / hydrochlorothiazide Oral Tablet | 7 |
The first link was ranked the highest as this drug can be most precisely classified under “acebutolol and thiazides; systemic”. Only this link was used as hierarchical in concept_ancestor.
NB! Since the relationships for ATC codes have been removed only from the concept_ancestor table, they are still available in the concept_relationship table. Therefore, the number of links in these two tables will not match. It is a design choice to provide both specificity (concept_ancestor) and sensitivity (concept_relationships).
Main ATC-specific relationships
relationship_id | reverse | target vocabulary_id | description |
Maps to | Mapped from | RxNorm | RxNorm Extension |
|
ATC - RxNorm | RxNorm - ATC | RxNorm | RxNorm Extension |
|
ATC - RxNorm pr lat | RxNorm - ATC pr lat | RxNorm | RxNorm Extension |
|
ATC - RxNorm sec lat | RxNorm - ATC sec lat | RxNorm | RxNorm Extension |
|
ATC - RxNorm pr up | RxNorm - ATC pr up | RxNorm | RxNorm Extension |
|
ATC - RxNorm sec up | RxNorm - ATC sec up | RxNorm | RxNorm Extension |
|
Possible combinations of ATC-specific relationships used for Ingredients:
- ATC - RxNorm pr lat + ATC - RxNorm sec lat
- ATC - RxNorm pr lat + ATC - RxNorm sec up
- ATC - RxNorm pr up + ATC - RxNorm sec up
- ATC - RxNorm pr up + ATC - RxNorm sec lat
relationship_id | reverse |
target
vocabulary_id |
description |
Drug class of drug
(reverse) |
Drug has drug class
(initial) |
Multilex | RxNorm | RxNorm Extension |
|
Is a | Subsumes | ATC |
|
ATC - SNOMED eq
(reverse) |
SNOMED - ATC eq
(initial) |
SNOMED |
|
ATC to NDFRT eq
(reverse) |
NDFRT to ATC eq (initial) | NDFRT |
|
ATC to VA Class eq
(reverse) |
VA Class to ATC eq (initial) | VA Class |
|