Data dictionary for Study Schema - IntegratedBreedingPlatform/Documentation GitHub Wiki

Introduction

Study module (also known as DMS - Data Management System) of BMS is based on Chado natural diversity module. Here we will focus on key table structures and how they are used in BMS.

It is highly recommended that following conceptual and background information documents be read first:

project Table

Column Name Description Type Size (bytes)
project_id DB Record ID of a dataset. One study (Nursery/Trial) comprises of three datasets (STUDY, PLOT and ENVIRONMENT). Integer 4
name Name of the dataset. Text 255
description Description of the dataset Text 255
program_uuid Unique ID of the program record in workbench_project table in workbench database. Text 36
deleted Indicates if the project is deleted. tinyint 1
start_date Start date of the project. Varchar 8
end_date End date of the project. Varchar 8
study_update Date of the update of the project. Varchar 8
objective Objective of the project. Varchar 255
created_by Id of the creator of the project. Varchar 255
study_id ID of the project record the dataset belongs to (if dataset record) int 4
study_type_id Study type of the project (applicable for study record) int 4
parent_project_id ID of the parent project record (parent dataset, parent study, parent folder) int 4
dataset_type_id Dataset Type of the project (applicable for dataset record) int 4
locked Indicates if the project is locked from editing by other users tinyint 1
generation_level Indicates the cross generation level for the entries associated to the study int 11

Following example shows the three (dataset) rows in project table that are inserted when a trial (named T1) is created in BMS:

+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
| project_id | name           | description    | program_uuid                         |deleted|start_date|end_date|study_update|objective|created_by|study_type|
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
|      25007 | T1             | T1             | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |20160201|20180901|20180901|objective_test | 1      |   6     |
|      25008 | T1-ENVIRONMENT | T1-ENVIRONMENT | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |NULL    |NULL    |NULL    |NULL           |NULL    | NULL    |
|      25009 | T1-PLOTDATA    | T1-PLOTDATA    | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |NULL    |NULL    |NULL    |NULL           |NULL    | NULL    |
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
***

projectprop Table

Stores name/value pair style properties for each of the dataset in project table.

Column Name Description Type Size (bytes)
projectprop_id Primary key. DB record id. Integer 4
project_id Foreign key to the project table record of the dataset. Integer 4
type_id Foreign key to the cvterm table. Represents the "name" part of the name/value property. Integer 4
value Value of the property Text 255
rank One property is described in this table with multiple rows. Rank is used to group related properties. Integer 4
variable_id Id of the variable in cvterm Integer 11
alias Alias of the variable Varchar 255
name_fldno Foreign key to the udflds table Integer 11

nd_experiment Table

From Chado wiki:

This is the core table for the natural diversity module, representing each individual assay that is undertaken (nb this is usually not an entire experiment). Each nd_experiment should give rise to a single genotype or phenotype and be described via 1 (or more) protocols. Collections of assays that relate to each other should be linked to the same record in the project table. Experiment.type is a cvterm that will define which records are expected for other tables. Any CV may be used but it was designed with terms such as: [phenotype_assay, genotype_assay, field_collection, cross_experiment] in mind.

Each row in the table represents one observation unit in a study against which many measurements/observations are recorded. Equates to a plot.

Column Name Description Type Size (bytes)
nd_experiment_id Primary key. DB record id. Integer 4
nd_geolocation_id Foreign key joining to the nd_geolocation table. One study can have many geo locations e.g. in multi-location international trials. Integer 4
type_id Foriegn key to cvterm that describes the type of observation unit e.g. Plot, environment etc. Integer 4
obs_unit_id Observation Unit id. Varchar 36
project_id Foreign key to project table. Integer 11
stock_id Foreign key to stock table. Integer 11
parent_id Parent dataset Id. Integer 11
observation_unit_no Observation Unit Number. Integer

nd_experimentprop Table

Stores name/value pair style properties for each of experiment units in nd_experiment table.

Column Name Description Type Size (bytes)
nd_experimentprop_id Primary key. DB record id. Integer 4
nd_experiment_id Forieng key to nd_experiment table record for which this record defines a property. Integer 4
type_id Foreign key to the cvterm table. Represents the "name" part of the name/value property. Integer 4
value Value of the property Text 255
rank One property could be described in this table with multiple rows. Rank is used to group related properties. Integer 4

phenotype Table

This is where measurements, trait observations are stored for each observation unit in study.

Column Name Description Type Size (bytes)
phenotype_id Primary key. DB record id. Integer 4
uniquename Probably not used. Text 255
name Always same as observable_id in but text form. TODO insert explanation from Graham's email here. Text 255
observable_id Foreign key to cvterm table which represents the variable (trait) being measured. Integer 4
attr_id Probably not used Integer 4
value Value of the trait or characteristic observed. Text 255
cvalue_id Cvterm of categorical variable value Integer 4
assay_id Probably not used Integer 4
nd_experiment_id Foreign key to nd_experiment table Integer 4
created_date Creation Date TIMESTAMP
updated_date Modification Date TIMESTAMP
draft_value Draft value of the trait or characteristic observed. Text 255
draft_cvalue_id Cvterm of categorical variable value Integer 4
created_by ID of user who created the phenotype observation Integer 4
updated_by ID of user who updated the phenotype observation Integer 4
json_props Stores additional info of an observation in json format LONGTEXT

phenotype_outlier Table

Column Name Description Type Size (bytes)
phenotype_outlier_id Primary key. DB record id. Integer 4
phenotype_id Foreign key to the phenotype table Integer 4
value Value of the outlying data for the phenotype. Text 45
date_modified Date last modified. Integer 4

stock table

Stores information about the Germplasm material used in the study. Chado stock module wiki.

Column Name Description Type Size (bytes)
stock_id Primary key. DB record id. Integer 4
dbxref_id Foreign key to the germplsm table Integer 4
organism_id Probably unused. Integer 4
uniquename Entry # of the germplasm in the list used for study Text 255
description Probably unused.TODO Confirm. Text 255
is_obsolete Probably unused. TODO Confirm. Tiny Int 1
project_id Foreign key to table project. The study record the stock belongs to Integer 11
cross_value Cross for the germplasm in the entry Text 5000

stockprop table

Name/Value pair style propertied associated with a study germplasm.

Column Name Description Type Size (bytes)
stockprop_id Primary key. DB record id. Integer 4
stock_id Foreign key to the stock table. Integer 4
type_id Foreign key to the cvterm table, representing the "name" part of the property Integer 4
value Value of the stock property Text 255
cvalue_id Term id of the categorical value Integer 11

nd_geolocation Table

Column Name Description Type Size (bytes)
nd_geolocation_id Primary key. DB record id. Integer 4
description Description of the geographic location Text 255
latitude Latitude. Float 4
longitude Longitude. Float 4
geodetic_datum Geodetic datum. Text 32
altitude Altitude. Float 4

nd_geolocationprop Table

Name/Value pair style propertied associated with a study location/instance.

Column Name Description Type Size (bytes)
nd_geolocationprop_id Primary key. DB record id. Integer 4
nd_geolocation_id Foreign key to the nd_geolocation table. Integer 4
type_id Foreign key to the cvterm table, representing the "name" part of the property Integer 4
value Value of the geolocation property Text 255
rank Groups related geo location properties Integer 4

sample_list Table

This contains information about a list of sampled plants in study.

Column Name Description Type Size (bytes)
list_id Primary key. DB record id. Integer 4
list_name Name of the list Text 100
description Title or description of the list Text 255
hierarchy Parent folder ID of the list Integer 4
created_date Date when list was created Timestamp
notes Additional information about the list Text
created_by ID of user who created the list Integer 4
type Type of List Text 30
program_uuid Unique ID of the program record in workbench_project table in workbench database Text 36

sample Table

This contains information plants sampled in study.

Column Name Description Type Size (bytes)
sample_id Primary key. DB record id. Integer 4
entry_no Entry Number in sampled list Integer 4
sample_name Name of sample Text 100
taken_by ID of user who took sample Integer 4
sampling_date Date sample was taken Date
created_date Timestamp when record was created Timestamp
sample_bk Unique ID of the sample Text 45
sample_list ID of sample list it belongs to Integer 4
created_by ID of user who created sample Integer 4
plate_id Text 255
well Text 255
nd_experiment_id ID of observation unit the sample belongs to Integer 4
sample_no Sample Number within each related observation unit Integer 4

study_type Table

This contains values for different types of study.

Column Name Description Type Size (bytes)
study_type_id Primary key. DB record id. Integer 4
label Description of study type Text 45
name Name of Study type Text 45
cvterm_id ID in cvterm table of record when study types still existed in ontology Integer 4
visible Indicator if study type is visible in system TINYINT 4

dataset_type Table

This contains values for different types of dataset.

Column Name Description Type Size (bytes)
dataset_type_id Primary key. DB record id. Integer 4
name Name of dataset type Text 100
description Description of dataset type Text 1024
cvterm_id ID in cvterm table of record when dataset types still existed in ontology Integer 4
is_subobs_type Indicator if dataset type is sub-observation dataset type TINYINT 4
is_obs_type Indicator if dataset type is observation dataset type TINYINT 4

germplasm_study_source Table

Store relationships between crosses/advances and its source experimental unit and study.

Existing cross lists and advance lists were inserted into this table, however with only the source study(project_id column) populated. The source experimental unit (nd_experiment_id column) could not be deduced and hence is empty.

Column Name Description Type Size (bytes)
germplasm_study_source_id Primary key. DB record id. Integer 4
gid Foreign Key to the germplasm table Integer 4
project_id Foreign key to the project table Integer 4
nd_experiment_id Foreign key to the nd_experiment table Integer 4
type Germplasm Study Source Type: ADVANCE/CROSS Text 45