Data dictionary for Study Schema - IntegratedBreedingPlatform/Documentation GitHub Wiki
Introduction
Study module (also known as DMS - Data Management System) of BMS is based on Chado natural diversity module. Here we will focus on key table structures and how they are used in BMS.
It is highly recommended that following conceptual and background information documents be read first:
- Chado natural diversity module wiki.
- Word document Phenotyping Data Management System for IBDB-20150430.docx for the conceptual understanding of the phenotyping related design concepts.
project Table
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
project_id | DB Record ID of a dataset. One study (Nursery/Trial) comprises of three datasets (STUDY, PLOT and ENVIRONMENT). | Integer | 4 |
name | Name of the dataset. | Text | 255 |
description | Description of the dataset | Text | 255 |
program_uuid | Unique ID of the program record in workbench_project table in workbench database. | Text | 36 |
deleted | Indicates if the project is deleted. | tinyint | 1 |
start_date | Start date of the project. | Varchar | 8 |
end_date | End date of the project. | Varchar | 8 |
study_update | Date of the update of the project. | Varchar | 8 |
objective | Objective of the project. | Varchar | 255 |
created_by | Id of the creator of the project. | Varchar | 255 |
study_id | ID of the project record the dataset belongs to (if dataset record) | int | 4 |
study_type_id | Study type of the project (applicable for study record) | int | 4 |
parent_project_id | ID of the parent project record (parent dataset, parent study, parent folder) | int | 4 |
dataset_type_id | Dataset Type of the project (applicable for dataset record) | int | 4 |
locked | Indicates if the project is locked from editing by other users | tinyint | 1 |
generation_level | Indicates the cross generation level for the entries associated to the study | int | 11 |
Following example shows the three (dataset) rows in project table that are inserted when a trial (named T1) is created in BMS:
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
| project_id | name | description | program_uuid |deleted|start_date|end_date|study_update|objective|created_by|study_type|
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
| 25007 | T1 | T1 | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0 |20160201|20180901|20180901|objective_test | 1 | 6 |
| 25008 | T1-ENVIRONMENT | T1-ENVIRONMENT | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0 |NULL |NULL |NULL |NULL |NULL | NULL |
| 25009 | T1-PLOTDATA | T1-PLOTDATA | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0 |NULL |NULL |NULL |NULL |NULL | NULL |
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
***
projectprop Table
Stores name/value pair style properties for each of the dataset in project table.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
projectprop_id | Primary key. DB record id. | Integer | 4 |
project_id | Foreign key to the project table record of the dataset. | Integer | 4 |
type_id | Foreign key to the cvterm table. Represents the "name" part of the name/value property. | Integer | 4 |
value | Value of the property | Text | 255 |
rank | One property is described in this table with multiple rows. Rank is used to group related properties. | Integer | 4 |
variable_id | Id of the variable in cvterm | Integer | 11 |
alias | Alias of the variable | Varchar | 255 |
name_fldno | Foreign key to the udflds table | Integer | 11 |
nd_experiment Table
From Chado wiki:
This is the core table for the natural diversity module, representing each individual assay that is undertaken (nb this is usually not an entire experiment). Each nd_experiment should give rise to a single genotype or phenotype and be described via 1 (or more) protocols. Collections of assays that relate to each other should be linked to the same record in the project table. Experiment.type is a cvterm that will define which records are expected for other tables. Any CV may be used but it was designed with terms such as: [phenotype_assay, genotype_assay, field_collection, cross_experiment] in mind.
Each row in the table represents one observation unit in a study against which many measurements/observations are recorded. Equates to a plot.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
nd_experiment_id | Primary key. DB record id. | Integer | 4 |
nd_geolocation_id | Foreign key joining to the nd_geolocation table. One study can have many geo locations e.g. in multi-location international trials. | Integer | 4 |
type_id | Foriegn key to cvterm that describes the type of observation unit e.g. Plot, environment etc. | Integer | 4 |
obs_unit_id | Observation Unit id. | Varchar | 36 |
project_id | Foreign key to project table. | Integer | 11 |
stock_id | Foreign key to stock table. | Integer | 11 |
parent_id | Parent dataset Id. | Integer | 11 |
observation_unit_no | Observation Unit Number. | Integer |
nd_experimentprop Table
Stores name/value pair style properties for each of experiment units in nd_experiment table.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
nd_experimentprop_id | Primary key. DB record id. | Integer | 4 |
nd_experiment_id | Forieng key to nd_experiment table record for which this record defines a property. | Integer | 4 |
type_id | Foreign key to the cvterm table. Represents the "name" part of the name/value property. | Integer | 4 |
value | Value of the property | Text | 255 |
rank | One property could be described in this table with multiple rows. Rank is used to group related properties. | Integer | 4 |
phenotype Table
This is where measurements, trait observations are stored for each observation unit in study.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
phenotype_id | Primary key. DB record id. | Integer | 4 |
uniquename | Probably not used. | Text | 255 |
name | Always same as observable_id in but text form. TODO insert explanation from Graham's email here. | Text | 255 |
observable_id | Foreign key to cvterm table which represents the variable (trait) being measured. | Integer | 4 |
attr_id | Probably not used | Integer | 4 |
value | Value of the trait or characteristic observed. | Text | 255 |
cvalue_id | Cvterm of categorical variable value | Integer | 4 |
assay_id | Probably not used | Integer | 4 |
nd_experiment_id | Foreign key to nd_experiment table | Integer | 4 |
created_date | Creation Date | TIMESTAMP | |
updated_date | Modification Date | TIMESTAMP | |
draft_value | Draft value of the trait or characteristic observed. | Text | 255 |
draft_cvalue_id | Cvterm of categorical variable value | Integer | 4 |
created_by | ID of user who created the phenotype observation | Integer | 4 |
updated_by | ID of user who updated the phenotype observation | Integer | 4 |
json_props | Stores additional info of an observation in json format | LONGTEXT |
phenotype_outlier Table
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
phenotype_outlier_id | Primary key. DB record id. | Integer | 4 |
phenotype_id | Foreign key to the phenotype table | Integer | 4 |
value | Value of the outlying data for the phenotype. | Text | 45 |
date_modified | Date last modified. | Integer | 4 |
stock table
Stores information about the Germplasm material used in the study. Chado stock module wiki.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
stock_id | Primary key. DB record id. | Integer | 4 |
dbxref_id | Foreign key to the germplsm table | Integer | 4 |
organism_id | Probably unused. | Integer | 4 |
uniquename | Entry # of the germplasm in the list used for study | Text | 255 |
description | Probably unused.TODO Confirm. | Text | 255 |
is_obsolete | Probably unused. TODO Confirm. | Tiny Int | 1 |
project_id | Foreign key to table project. The study record the stock belongs to | Integer | 11 |
cross_value | Cross for the germplasm in the entry | Text | 5000 |
stockprop table
Name/Value pair style propertied associated with a study germplasm.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
stockprop_id | Primary key. DB record id. | Integer | 4 |
stock_id | Foreign key to the stock table. | Integer | 4 |
type_id | Foreign key to the cvterm table, representing the "name" part of the property | Integer | 4 |
value | Value of the stock property | Text | 255 |
cvalue_id | Term id of the categorical value | Integer | 11 |
nd_geolocation Table
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
nd_geolocation_id | Primary key. DB record id. | Integer | 4 |
description | Description of the geographic location | Text | 255 |
latitude | Latitude. | Float | 4 |
longitude | Longitude. | Float | 4 |
geodetic_datum | Geodetic datum. | Text | 32 |
altitude | Altitude. | Float | 4 |
nd_geolocationprop Table
Name/Value pair style propertied associated with a study location/instance.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
nd_geolocationprop_id | Primary key. DB record id. | Integer | 4 |
nd_geolocation_id | Foreign key to the nd_geolocation table. | Integer | 4 |
type_id | Foreign key to the cvterm table, representing the "name" part of the property | Integer | 4 |
value | Value of the geolocation property | Text | 255 |
rank | Groups related geo location properties | Integer | 4 |
sample_list Table
This contains information about a list of sampled plants in study.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
list_id | Primary key. DB record id. | Integer | 4 |
list_name | Name of the list | Text | 100 |
description | Title or description of the list | Text | 255 |
hierarchy | Parent folder ID of the list | Integer | 4 |
created_date | Date when list was created | Timestamp | |
notes | Additional information about the list | Text | |
created_by | ID of user who created the list | Integer | 4 |
type | Type of List | Text | 30 |
program_uuid | Unique ID of the program record in workbench_project table in workbench database | Text | 36 |
sample Table
This contains information plants sampled in study.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
sample_id | Primary key. DB record id. | Integer | 4 |
entry_no | Entry Number in sampled list | Integer | 4 |
sample_name | Name of sample | Text | 100 |
taken_by | ID of user who took sample | Integer | 4 |
sampling_date | Date sample was taken | Date | |
created_date | Timestamp when record was created | Timestamp | |
sample_bk | Unique ID of the sample | Text | 45 |
sample_list | ID of sample list it belongs to | Integer | 4 |
created_by | ID of user who created sample | Integer | 4 |
plate_id | Text | 255 | |
well | Text | 255 | |
nd_experiment_id | ID of observation unit the sample belongs to | Integer | 4 |
sample_no | Sample Number within each related observation unit | Integer | 4 |
study_type Table
This contains values for different types of study.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
study_type_id | Primary key. DB record id. | Integer | 4 |
label | Description of study type | Text | 45 |
name | Name of Study type | Text | 45 |
cvterm_id | ID in cvterm table of record when study types still existed in ontology | Integer | 4 |
visible | Indicator if study type is visible in system | TINYINT | 4 |
dataset_type Table
This contains values for different types of dataset.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
dataset_type_id | Primary key. DB record id. | Integer | 4 |
name | Name of dataset type | Text | 100 |
description | Description of dataset type | Text | 1024 |
cvterm_id | ID in cvterm table of record when dataset types still existed in ontology | Integer | 4 |
is_subobs_type | Indicator if dataset type is sub-observation dataset type | TINYINT | 4 |
is_obs_type | Indicator if dataset type is observation dataset type | TINYINT | 4 |
germplasm_study_source Table
Store relationships between crosses/advances and its source experimental unit and study.
Existing cross lists and advance lists were inserted into this table, however with only the source study(project_id column) populated. The source experimental unit (nd_experiment_id column) could not be deduced and hence is empty.
Column Name | Description | Type | Size (bytes) |
---|---|---|---|
germplasm_study_source_id | Primary key. DB record id. | Integer | 4 |
gid | Foreign Key to the germplasm table | Integer | 4 |
project_id | Foreign key to the project table | Integer | 4 |
nd_experiment_id | Foreign key to the nd_experiment table | Integer | 4 |
type | Germplasm Study Source Type: ADVANCE/CROSS | Text | 45 |