Data Dictionary - seattleflu/documentation GitHub Wiki

Data Dictionary for ID3C Tables

Below is a data dictionary for data stored in ID3C. Data is organized by its location within the ID3C schema. Relational linkages have been preserved where possible.

Database Directory

Warehouse
Receiving
Shipping
Operations
- Deliverables Log
- Test Quota

Warehouse

The warehouse schema is where raw data from projects ingested into ID3C lives. The data in this schema is standardized and analysis ready.

Encounter

Rows in the encounter table represent: An interaction with an individual to collect point-in-time information or samples

Encounter Columns

encounter_id: Internal id of this encounter; integer
identifier: External identifier for this encounter; case-sensitive; text
individual_id: Internal id of the individual who was encountered; integer
site_id: Internal id of the site where the encounter occurred; integer
encountered: When the encounter occurred; timestamp
details: Additional information about this encounter which does not have a place in the relational schema; jsonb
age: Age of individual at the time of this encounter; interval
created: When this encounter record was first processed by ID3C; timestamp
modified: When this encounter record was last modified; timestamp

Individual

Rows in the individual table represent: A single, real person (or other member of a population)

Individual Columns

individual_id: Internal id of this individual; integer
identifier: External identifier for this individual (e.g. study participant id); text
sex: Sex assigned at birth; assigned-sex (see fhir documentation here)
details: Additional information about this individual which does not have a place in the relational schema; jsonb
created: When this individual record was first processed by ID3C; timestamp
modified: When this individual record was last modified; timestamp

Sample

Rows in the sample table represent: A sample collected from an individual during a specific encounter

Sample Columns

sample_id: Internal id of this sample; integer
identifier: A unique external identifier assigned to this sample; text
encounter_id: The encounter where the sample was collected; integer
details: Additional information about this sample which does not have a place in the relational schema; jsonb
collection_identifier: A unique external identifier assigned to the collection of this sample; text
collected: The date when this sample was collected; date
created: When this sample was first processed by ID3C; timestamp
modified: When this sample was last modified; timestamp

Presence Absence

Rows in the presence_absence table represent: A presence/absence test result from an individual sample

Presence Absence Columns

presence_absence_id: Internal id of this presence/absence test; integer
identifer: External id of this presence/absence test; integer
sample_id: Internal id of the sample that is being tested for presence/absence of a target; integer
target_id: Internal id of the target for the presence/absence test; integer
present: The result of a presence/absence test. True if the target is found, False if not, null if inconclusive; boolean
details: The details for each replicate of a presence/absence test; jsonb
created: When this presence/absence test was first processed by ID3C; timestamp
modified: When this presence/absence test was last modified; timestamp

Organism

Rows in the organism table represent: Hierarchical classification of taxa

Organism Columns

organism_id: Internal id of this organism; used only for internal foreign keys; integer
lineage: Label tree of ancestors with this organism as the leaf/final label; ltree
identifiers: Key-value pairs of external identifiers (i.e. dbxrefs) for this organism; keys should use well-known database or ontology prefix (e.g. NCBITaxon); hstore
details: Additional information about this organism which does not have a place in the relational schema

Target

Rows in the target table represent: A thing (e.g. pathogen) that can be detected in presence-absence testing

Target Columns

target_id: Internal id of this target; integer
identifier: An identifier assigned to this target by the testing center, assumed to be stable and unique; text
control: Whether the this target is a control. True if it is, False if not; boolean
organism_id: Internal id of the organism detected by this target; most-specific available; integer

Location

Rows in the location table represent: Hierarchical geospatial locations. Note these locations use EPSG:4326, a latitude/longitude coordinate system used by traditional GPS services

Location Columns

location_id: Internal id of this location; integer
identifier: External identifier by which this location is known; citext
scale: Relative size or extent of this location (e.g. country, state, city); citext
hierarchy: Set of key-value pairs describing where this location fits within a hierarchy; hstore
point: Representative point geometry in WGS84 (EPSG:4326); geometry(Point,4326)
polygon: Multi-polygon geometry in WGS84 (EPSG:4326); geometry(MultiPolygon,4326)
simplified_polygon: Multi-polygon geometry in WGS84 (EPSG:4326) with reduced complexity/accuracy, intended for cartographic use; geometry(MultiPolygon,4326)
details: Additional information about this location which does not have a place in the relational schema; jsonb
created: When this location was first processed by ID3C; timestamp
modified: When this location was last modified; timestamp

Site

Rows in the site table represent: A real-world or virtual/logical location where individuals are encountered

Site Columns

site_id: Internal id of this site; integer
identifier: External identifier for this site; case-preserving but must be unique case-insensitively; text
details: Additional information about this site which does not have a place in the relational schema; jsonb

Encounter Location

Rows in the encounter_location table represent: The association between an encounter and location

Encounter Location Columns

encounter_id: Internal id of the encounter; integer
location_id: Internal id of the location; integer
relation: The relation between the encounter and location, e.g. collection site, residence, workplace, etc; citext
details: Additional information about this encounter-location which does not have a place in the relational schema; jsonb
created: When this encounter/location relation was first processed by ID3C; timestamp
modified: When this encounter/location relation was last modified; timestamp

Encounter Location Relation

Rows in the encounter_location_relation table represent: Controlled vocabulary for encounter/location relationships

Encounter Location Relation Columns

relation: A relation between an encounter and location, e.g. collection site, residence, workplace, etc; citext
priority: Arbitrary inter-relation ranking, where smaller numbers mean greater importance within this ID3C instance. Used to determine the default "primary" location related to an encounter when there is more than one; integer

Identifier Set

Rows in the identifier_set table represent: A conceptual group of identifiers used for a common purpose

Identifer Set Columns

identifier_set_id: Internal id of this identifier set; integer
name: The well-known, unique name of the set (e.g. collections-scan, collections-airs, etc.); text
description: A plain text description of the purpose and usage of this set; text
use: A standard identifier use type, e.g. sample, collection, clia; citext

Identifier

Rows in the identifier table represent: Globally unique ids and their CualID barcodes, used for tracking physical things

Identifier Columns

uuid: UUIDv4 that should be used in programmatic and downstream data processing; relates to both the identifier and collection-identifier of sample records; uuid
barcode: Short suffix of UUID (CualID) used as a tracking barcode; citext
identifier_set_id: Internal id of the identifier set this identifier belongs to; integer
generated: When this identifier was generated; timestamp

Identifier Set Use

Rows in the identifier_set_use table represent: Controlled vocabularly defining how an identifier set is used

Identifier Set Use Columns

use: A standard identifier use type, e.g. sample, collection, clia; citext
description: A plain text description of this identifier use type; text

Sequence Read Set

Rows in the sequence_read_set table represent: A set of references to sequence reads from an individual sample

Sequence Read Set Columns

sequence_read_set_id: Internal id of this set of sequence reads; integer
sample_id: Internal id of the sample from which the sequence reads were generated; integer
urls: External object store urls that hold the sequence reads; text[]
details: Additional information about this sequence read set which does not have a place in the relational schema; jsonb
created: When this sequence read set was first processed by ID3C; timestamp
modified: When this sequence read set was last modified; timestamp

Consensus Genome

Rows in the consensus_genome table represent: A whole consensus genome for an organism extracted from a sample that may have been generated from a sequence read set

Consensus Genome Columns

consensus_genome_id: Internal id of this consensus genome; integer
sample_id: Internal id of the sample from which this consensus genome was extracted; integer
organism_id: Internal id of the organism to which this consensus genome belongs to; integer
sequence_read_set_id: Internal id of the sequence read set that was used to generate this consensus genome; may be null if the genome was generated from an external source; integer
details: Additional information about this consensus genome which does not have a place in the relational schema; jsonb

Genomic Sequence

Rows in the genomic_sequence table represent: A genomic sequence from a genome. There may be multiple sequences per genome if the genome is segmented (e.g. Influenza genomes)

Genomice Sequence Columns

genomic_sequence_id: Internal id of this genomic sequence; integer
identifier: An external, unqiue identifier for this genomic sequence; text
segment: The segment of the genome that this genomic sequence represents; citext
seq: A succession of letters indicating the order of nucleotides of the genomic sequence; text
consensus_genome_id: Internal id of the consensus genome that this genomic sequence belongs to; integer
details: Additional information about this genomic sequence which does not have a place in the relational schema; jsonb
created: When this genomic sequence was first processed by ID3C; timestamp
modified: When this genomic sequence was last modified; timestamp

Kit

Rows in the kit table represent: A self-test kit connected to an encounter and 2 different samples

Kit Columns

kit_id: Internal id of this kit; integer
identifier: A unique exteranl identifier assigned to this kit; text
encounter_id: Internal id of the encounter to which this kit is associated; integer
rdt_sample_id: Internal id of the sample collected from the residual RDT extractant; integer
utm_sample_id: Internal id of the sample collected from the second swap; integer
details: Additional information about this kit which does not have a place in the relational schema; jsonb

Receiving

The receiving table stores non-relational (or minimally-relational) data before it is transformed and loaded into the warehouse by one of our ETL processes

Clinical

Rows in the clinical table represent: A set of clinical documents; append only

Clinical Columns

clinical_id: Internal id of this clinical record set; integer
document: JSON document created from a pre-processed Excel file; json
received: When this clinical document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Consensus Genome

Rows in the consensus_genome table represent: A set of consensus genome documents; append only

Consensus Genome Columns

consensus_genome_id: Internal id of this consensus genome; integer
document: Original consensus genome JSON document; json
received: When this consensus genome document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Enrollment

Rows in the enrollment table represent: A set of enrollment documents; append only

Enrollment Columns

enrollment_id: Internal id of this enrollment document; integer
document: Original enrollment JSON document; json
received: When this enrollment document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

FHIR

Rows in the fhir table represent: A set of FHIR documents; append only

FHIR Columns

fhir_ud: Internal id of this fhir record set; integer
document: JSON document in FHIR format; json
received: When this FHIR document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Longitudinal

Rows in the longitudinal table represent: A set of longitudianl documents; append only

Longitudinal Columns

longitudinal_id: Internal id of this longitudinal document; integer
document: JSON document created from a pre-processed Excel file; json
received: When this longitudinal document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Manifest

Rows in the manifest table represent: A set of manifest records; append only

Manifest Columns

manifest_id: Internal id of this manifest document; integer
document: Manifest record as a JSON document; json
received: When this manifest document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Presence Absence

Rows in the presence_absence table represent: A set of presence/absence documents; append only

Presence Absence Columns

presence_absence_id: Internal id of this presence/absence document; integer
document: Original presence/absence JSON document; json
received: When this presence/absence document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

REDCap DET

Rows in the redcap_det table represent: A set of REDCap DET request data documents; append only

REDCap DET Columns

redcap_det_id: Internal id of this REDCap DET document; integer
document: JSON document created from REDCap DET request data; json
received: When this REDCap DET document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Sequence Read Set

Rows in the sequence_read_set table represent: A set of references to sequence reads; append only

Sequence Read Set Columns

sequence_read_set_id: Internal id of this sequence read set document; integer
document: Original sequence read set JSON document; json
received: When this sequence read set document was received; timestamp
processing_log: Event log recording details of ETL into data warehouse; jsonb

Shipping

Outgoing warehouse data prepared for external consumers

FHIR Questionnaire Responses V1

fhir_questionnaire_responses_v1 is a view of FHIR Questionnaire Responses stored in an encounter's details

FHIR Questionnaire Responses V1 Columns

encounter_id: Internal id of the encounter this response is linked with; integer
link_id: Identifier of the question this response relates to; text
string_response: If the response to the question is a string, it is present in this field; text[]
boolean_response: If the response to the question is a boolean, it is present in this field; boolean
date_response: If the response to the question is a date, it is present in this field; text[]
integer_response: If the response to the question is an integer, it is present in this field; integer[]
numeric_response: If the response to the question is numeric, it is present in this field; numeric[]
code_response: If the response to this question is encoded, it is present in this field; text[]

Genome Submission Metadata V1

genome_submission_metadata_v1 is a view of minimal metadata used for consensus genome submissions

Genome Submission Metadata V1 Columns

sfs_sample_identifier: A unique external identifier assigned to this sample; text
sfs_sample_barcode: Short suffix of UUID (CualID) used as a tracking barcode, relates to the sample identifier; citext
sfs_collection_barcode: Short suffix of UUID (CualID) used as a tracking barcode, relates to the collection identifier; citext
collection_date: The date on which this sample was collected; timestamp
swab_type: The type of swab used to collect the sample (e.g. ans, mtp, np); text
source: The study group for which this sample belongs to (e.g. SCH, SFS, AIRS, etc.); text
puma: The puma code associated with the collection of this sample; text
county: The county to which the census tract associated with the collection of this sample; text
state: The state associated with the collection of this sample; text
baseline_surveillance: Whether or not this sample can be considered baseline surveillance; This includes samples that were randomly sampled for genomic surveillance, not identified in a targeted sampling effort, and/or sampled across targeted efforts in order to better represent a community; For more details see descriptions here; boolean

HCOV19 Presence Absence Result V1

hcov19_presence_absence_result_v1 is a view of hCov-19 Samples with Non-Clinical Presence / Absence Results

HCOV19 Presence Absence Result V1 Columns

presence_absence_id: Internal id of the presence / absence result this result is linked with; integer
result_ts: The timestamp associated with when this result was released. Note that this is the internal samplify timestamp (in PST) when the result was released if it is after 2022-07-18, the internal samplify timestamp (in GMT) if it is after 2022-06-09 (but before 2022-07-18), or the last modified date of the record if it is before 2022-06-09; timestamp
hcov19_result_release_date: The timestamp when this record was created; timestamp
target: Internal id of the target for the presence/absence test; integer
organism: Internal id of the organism detected by this target; most-specific available; integer

Incidence Model Observation V4

incidence_model_observation_v4 is a view of warehoused samples and important questionnaire responses for modeling and visualization

Incidence Model Observation V4 Columns

encounter: External identifier for this encounter; case-sensitive; text
encountered_week: The week during which this encounter occurred; text
site: External identifier for the site at which this encounter occurred; text
site_type: Classification of the role of a site (e.g. retrospective, clinical); text
individual: External identifier for this individual (e.g. study participant id); text
sex: Sex assigned at birth; assigned-sex (see fhir documentation here)
age_bin_fine: Five year range into which an individual's age at encounter falls; intervalrange
age_range_fine_lower: Lower bound of the above five year range; inclusive; numeric
age_range_fine_upper: Upper bound of the above five year range; exclusive; numeric
age_range_coarse: Coarse age range into which an individual's age at encounter falls; intervalrange
age_range_coarse_lower: Lower bound of the above coarse age range; inclusive; numeric
age_range_coarse_upper: Upper bound of the above coarse age range; exclusive; numeric
address_identifier: External identifier by which this residence location is known; citext
residence_census_tract: Census tract containing this residence in the location hierearchy; text
residence_puma: Puma code associated with this residence in the location hierarchy; text
flu_shot: Whether or not this individual had received a flu shot when encountered; boolean
symptoms: Symptoms experienced by the individual at the time of encounter; text array
manifest_origin: The origin of the sample (e.g. hmc_retro, sch_retro); text
swab_type: The type of swab used to collect the sample (e.g. ans, mtp, np); text
collection_matrix: The collection matrix method for this sample (e.g. dry, utm_vtm); text
sample: A unique external identifier assigned to this sample; text

Observation With Presence Absence Result V3

observation_with_presence_absence_result_v3 is a view joining the incidence_model_observation_v4, the presence_absence_result_v2, and the hcov19_observation_v1 views.

Observation With Presence Absence Result V3 Columns

target: Internal id of the target for the presence/absence observation; integer
organism: Internal id of the organism detected by this target; most-specific available; integer
present: The result of a presence/absence test. True if the target is found, False if not, null if inconclusive; boolean
presence: The result of a presence/absence test. 1 if the target is found, 0 if not, null if inconclusive; integer
device: The device this presence/absence test was run on (e.g. TaqmanQPCR, OpenArray); text
assay_type: The category of the assay for this test (e.g. Clia, Research); text
all other columns are defined above in the documentation for incidence_model_observation_v4.

Sample With Best Available Encounter Data V1

sample_with_best_available_encounter_data_v1 is a view of warehoused samples combined with best available envounter date and site details.

Sample With Best Available Encounter Data V1 Columns

sample_id: Internal id of this sample; integer
has_encounter_data: Whether or not this sample has associated encounter data; boolean
best_available_encounter_date: The date on which the best available encounter occurred; date
season: Season during which this encounter occurred; Y1 is prior to October 1st 2019, Y2 prior to November 1st 2020, Y3 prior to November 1st 2021, and Y4 prior to November 1st 2022; text
best_available_site_id: Internal identifier for the site where this encounter took place; integer
best_available_site: External identifier for the site where this encounter took place; text
best_available_site_type: Classification of the role of a site (e.g. retrospective, clinical) where this encounter took place; text
best_available_site_category: Classification of the category of a site (e.g. community, hospital) where this encounter took place; text

FHIR Encounter Details V2

fhir_encounter_details_v2 is a view of SCAN encounter data in the FHIR format.

FHIR Encounter Details V2 Columns

For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.

SCAN Enrollments V1

scan_enrollments_v1 is a view of enrollment data from the SCAN project.

SCAN Enrollments V1 Columns

illness_questionnaire_date: The date on which the illness questionnaire for this enrollment was completed; text
scan_study_arm: The study arm under which this enrollment occurred; text
priority_code: The priority code, if it exists, used by the participant when creating this enrollment; text
puma: The puma code for the location associated with this enrollment record; text
neighborhood_district: The neighborhood associated with this location (e.g. ballard, downtown, lake_union); text
geo_location_name: The area name based off the Puma location associated with this enrollment. See descriptions; text
kit_received: Whether or not a kit was sent along with the enrollment for this participant; boolean

SCAN Encounters With Best Available Vaccination Data V1

scan_encounters_with_best_available_vaccination_data_v1 is a view of SCAN encounters with best available vaccination data transformation applied

SCAN Encounters With Best Available Vaccination Data V1 Columns

individual: External identifier for this individual (e.g. study participant id); text
encounter_id: Internal id of this encounter; integer
encountered: Date on which this encounter occurred; date
pt_entered_covid_vax: Whether or not an individual reported covid vaccine information for this encounter; text
pt_entered_covid_doses: The number of doses of a covid vaccine an individual reported having had for this encounter; text
calculated_covid_doses: The number of doses it was calculated an individual had at this encounter; text
calculated_vac_date_1: The calculated date at which an individual got their first dose of the vaccine (most common first dose date); text
calculated_vac_name_1: The calculated vaccine name of an individual's first dose (most common name); text
calculated_vac_date_2:The calculated date at which an individual got their second dose of the vaccine (most common second dose date); text
calculated_vac_name_2: The calculated vaccine name of an individual's second dose (most common name); text
calculated_vac_date_3: The calculated date at which an individual got their third dose of the vaccine (most common third dose date); text
calculated_vac_name_3: The calculated vaccine name of an individual's third dose (most common name); text
vac_date_1_out_of_range: Whether the first dose date is out of order given the other calculated dates; This can nullify or alter the calculated best_available_vac_status for an individual; integer
vac_date_2_out_of_range: Whether the second dose date is out of order given the other calculated dates; This can nullify or alter the calculated best_available_vac_status for an individual; integer
vac_date_3_out_of_range: Whether the third dose date is out of order given the other calculated dates; This can nullify or alter the calculated best_available_vac_status for an individual; integer
best_available_vac_status: The best available vaccination status for an individual given the calculations above; possible values are not_vaccinated, boosted, fully_vaccinated, invalid, partially_vaccindated, unknown; text

SCAN Encounters V1

scan_encounters_v1 is a view of encounter data from the SCAN project. Note that duplicate encounters in this view may exist if an encounter encompasses multiple tests. This can happen in rare occasions when 2 kits are sent out for an encounter and registered in REDCap, with one kit on the incident report and not-tested and one kit tested.

SCAN Encounters V1 Columns

For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.

UW Reopening Encounters V1

uw_reopening_encounters_v1 is a view of encounter data tied to enrollment questionnaire data for HCT encounters.

UW Reopening Encounters V1 Columns

For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.

UW Reopening Enrollment FHIR Encounter Details V1

uw_reopening_enrollment_fhir_encounter_details_v1 is a view of enrollment details in the FHIR format.

UW Reopening Enrollment FHIR Encounter Details V1 Columns

For additional details about the columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.

Operations

Deliverables Log

Rows in the deliverables_log table represent: Deliverables sent by either our Linelist or Return of Results jobs

Deliverables Log Columns

deliverables_log_id: Internal id of this deliverable; integer
sample_barcode: The sample barcode associated with this deliverable; citext
collection_barcode: The collection barcode associated with this deliverable; citext
details: Additional information about this deliverable which does not have a place in the relational schema; jsonb
process_name: The process which sent this deliverable; text
sent: When this deliverable was sent; timestamp

Test Quota

Rows in the test_quota table represent: Quota entries for HCT testing invites

Test Quota Columns

name: Name of the process using this quota; text
timespan: Timespan during which this quota is active; timestamp range
max: The maximum number of testing entries alloted to this entry timespan; integer
used: The number of testing entries currently used during this entry timespan; integer