Vocab. UK_BIOBANK - OHDSI/Vocabulary-v5.0 GitHub Wiki
OHDSI forums release post: link.
Online showcase of UK Biobank resources: link.
UK Biobank - is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants.
All concepts were included except the following: Genomics data, Cardiac monitoring (measurement characteristics, such as ECG trace, acceleration, impedance and analysis date), Health related outcomes (except for a few fields coming from the HES dataset), bulk data (Invalid fields, DICOM files, chromosome genotype intensities, CRAM files, etc.).
Existing ICD9, ICD10, OPCS4 concepts used as possible answers in the UK Biobank have also been excluded from the UK Biobank vocabulary in order to avoid concept duplication. The concepts of SOC2000 vocabulary were also excluded unless it’s recognized as a separate OMOP vocabulary.
All Concepts are assigned the longest of all available names.
UK Biobank concepts have one distinct name per concept. However, the extended description of the question, revealing its context, is stored in UK Biobank notes. During the OMOPing the UK Biobank we preserved those descriptions with slight modifications in the concept_synonym table. Modifications imply erasing of non-relevant words and details: ‘Question asked:...’, ‘Participant was asked:...’, ‘ACE touchscreen question:...’, etc.
Source | OMOP | ||
title | Low calorie drink intake | concept_name | Low calorie drink intake |
notes | Question asked: "How many glasses/cans of low calorie or diet drinks (e.g. fizzy, squash) did you drink yesterday?"<p>If the participant activated the Help feature they were shown the message:<p><i>Low calorie flavoured water should be recorded under low calorie drinks.</i> | concept_synonym_name | How many glasses/cans of low calorie or diet drinks (e.g. fizzy, squash) did you drink yesterday? |
Concept names for pre-coordinated pairs were built by concatenation according to the following format:
‘Question name’: ’Answer name’
‘Variable name’: ’Value name’
In order to provide code uniqueness, for categories ‘c’ was added to category_id and used as concept_code. E.g. “c100078” for Biological samples.
For Biobank fields (Questions or Variables) field_id was used as concept_code. E.g. “30264” for Mean reticulocyte volume acquisition route.
For Answers/Values the combination of ‘encoding_id’-’value’ was used as concept_code. E.g. “1401-17” for Agoraphobia.
Domains for categories were assigned according to CDM specification. For each UK Biobank concept the respective domain was inferred from the concept’s category domain, unit and answer/value type provided by the source.
All concepts used as answers to questions are Observations.
There are also 4 Meas Values used as possible values for Measurements. The following values are: False, True, Measure invalid, Measure not cleanly recoverable from data.
Category - all UK Biobank categories were assigned this concept class.
Question - the concept was assigned this concept class if it represents the question the participant was asked.
Variable - the concept was assigned this concept class if it belongs to the Measurement domain or represents technical details of procedure/measurement or it represents the piece of information provided by the provider.
Answer - the concept was assigned this concept class if it is related to concept with ‘Question’ concept_class.
Value - the concept was assigned this concept class if it is related to concept with ‘Variable’ concept_class.
Precoordinated pair - a new concept class for Question/answer or Variable/Value pairs was introduced in order to provide the mappings for combinations. These concepts are Non-standard and always have a mapping to Standard OMOP entities.
Standard_concept value was assigned by the following rules:
Concept characteristic | standard_concept | Examples |
UK Biobank category | Classification | Alcohol |
Non-numeric Question/Variable without mapping | Standard | Added milk to espresso |
Non-numeric Question/Variable and Answers/Value with direct mapping provided separately by “Maps to” links | Non-standard |
Treatment/medication code
and and |
Non-numeric Question/Variable with full mapping equivalent provided through pre-coordinated pairs | Non-standard | HSV-1 seropositivity for Herpes Simplex virus-1 |
Non-numeric Question/Variable with mapping provided through pre-coordinated pairs | Standard | Noisy workplace |
Numeric Question/Variable without mapping and with at least one meaningful predefined Answer/Value | Standard | Lifetime number of depressed periods |
Numeric Question/Variable without at least one meaningful Answer/Value | Non-standard | Number of older siblings |
Answer/Value with relevant data underlined | Standard | False |
Answer/Value without relevant data underlined (flavours of NULL) | Non-standard | Measure invalid |
Pre-coordinated pairs are non-standard concepts but always have mapping to standard ones.
From | Relationship | To |
UK Biobank category | Category of | UK Biobank field (Question/Variable) |
UK Biobank field (Question/Variable) | Has answer | UK Biobank answer
(Answer/Value) |
UK Biobank field (Question/Variable) | Maps to unit | OMOP Standardized units |
UK Biobank field (Question/Variable) | Has precoord pair | Precoordinated pair of Question/Answer or Variable/Value |
UK Biobank answer
(Answer/Value) |
Has precoord pair | Precoordinated pair of Question/Answer or Variable/Value |
Precoordinated pair of Question/Answer or Variable/Value | Maps to | OMOP Standardized concept |
- Consider additional letters added to the concept_code and the style used for concatenation of codes.
- Depending on the type of mapping delivery (direct “Maps to” links or through pre-coordinated pars), specific JOINs should be used.
- Event dates should be extracted from the date/timestamp-type variables associated with variables of interest.
- Some variables are mapped to historical concepts based on the context but for some of them it is possible to specify a time period of historical concepts. Use values from such variables (for example, extract value from ‘Age angina diagnosed’ variable) to calculate the exact time period. Also use values from associated variables (for example, for ‘Cancer code, self-reported’ variable, associated variable is ‘Interpolated Year when cancer first diagnosed’) to calculate the exact time period.
- “Maps to unit” links should be used for populating the unit_source_value and unit_concept_id fields.