Data dictionary for Study Schema - IntegratedBreedingPlatform/Documentation GitHub Wiki

Introduction

Study module (also known as DMS - Data Management System) of BMS is based on Chado natural diversity module. Here we will focus on key table structures and how they are used in BMS.

It is highly recommended that following conceptual and background information documents be read first:

Chado natural diversity module wiki.
Word document Phenotyping Data Management System for IBDB-20150430.docx for the conceptual understanding of the phenotyping related design concepts.

project Table

Column Name	Description	Type	Size (bytes)
project_id	DB Record ID of a dataset. One study (Nursery/Trial) comprises of three datasets (STUDY, PLOT and ENVIRONMENT).	Integer	4
name	Name of the dataset.	Text	255
description	Description of the dataset	Text	255
program_uuid	Unique ID of the program record in workbench_project table in workbench database.	Text	36
deleted	Indicates if the project is deleted.	tinyint	1
start_date	Start date of the project.	Varchar	8
end_date	End date of the project.	Varchar	8
study_update	Date of the update of the project.	Varchar	8
objective	Objective of the project.	Varchar	255
created_by	Id of the creator of the project.	Varchar	255
study_id	ID of the project record the dataset belongs to (if dataset record)	int	4
study_type_id	Study type of the project (applicable for study record)	int	4
parent_project_id	ID of the parent project record (parent dataset, parent study, parent folder)	int	4
dataset_type_id	Dataset Type of the project (applicable for dataset record)	int	4
locked	Indicates if the project is locked from editing by other users	tinyint	1
generation_level	Indicates the cross generation level for the entries associated to the study	int	11

Following example shows the three (dataset) rows in project table that are inserted when a trial (named T1) is created in BMS:

+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
| project_id | name           | description    | program_uuid                         |deleted|start_date|end_date|study_update|objective|created_by|study_type|
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
|      25007 | T1             | T1             | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |20160201|20180901|20180901|objective_test | 1      |   6     |
|      25008 | T1-ENVIRONMENT | T1-ENVIRONMENT | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |NULL    |NULL    |NULL    |NULL           |NULL    | NULL    |
|      25009 | T1-PLOTDATA    | T1-PLOTDATA    | 36ad2439-9e16-47ae-bf72-355be4c1a764 |0      |NULL    |NULL    |NULL    |NULL           |NULL    | NULL    |
+------------+----------------+----------------+--------------------------------------+-------+----------+--------+--------+-----------+-----------+-----------+
***

projectprop Table

Stores name/value pair style properties for each of the dataset in project table.

Column Name	Description	Type	Size (bytes)
projectprop_id	Primary key. DB record id.	Integer	4
project_id	Foreign key to the project table record of the dataset.	Integer	4
type_id	Foreign key to the cvterm table. Represents the "name" part of the name/value property.	Integer	4
value	Value of the property	Text	255
rank	One property is described in this table with multiple rows. Rank is used to group related properties.	Integer	4
variable_id	Id of the variable in cvterm	Integer	11
alias	Alias of the variable	Varchar	255
name_fldno	Foreign key to the udflds table	Integer	11

nd_experiment Table

From Chado wiki:

This is the core table for the natural diversity module, representing each individual assay that is undertaken (nb this is usually not an entire experiment). Each nd_experiment should give rise to a single genotype or phenotype and be described via 1 (or more) protocols. Collections of assays that relate to each other should be linked to the same record in the project table. Experiment.type is a cvterm that will define which records are expected for other tables. Any CV may be used but it was designed with terms such as: [phenotype_assay, genotype_assay, field_collection, cross_experiment] in mind.

Each row in the table represents one observation unit in a study against which many measurements/observations are recorded. Equates to a plot.

Column Name	Description	Type	Size (bytes)
nd_experiment_id	Primary key. DB record id.	Integer	4
nd_geolocation_id	Foreign key joining to the nd_geolocation table. One study can have many geo locations e.g. in multi-location international trials.	Integer	4
type_id	Foriegn key to cvterm that describes the type of observation unit e.g. Plot, environment etc.	Integer	4
obs_unit_id	Observation Unit id.	Varchar	36
project_id	Foreign key to project table.	Integer	11
stock_id	Foreign key to stock table.	Integer	11
parent_id	Parent dataset Id.	Integer	11
observation_unit_no	Observation Unit Number.	Integer

nd_experimentprop Table

Stores name/value pair style properties for each of experiment units in nd_experiment table.

Column Name	Description	Type	Size (bytes)
nd_experimentprop_id	Primary key. DB record id.	Integer	4
nd_experiment_id	Forieng key to nd_experiment table record for which this record defines a property.	Integer	4
type_id	Foreign key to the cvterm table. Represents the "name" part of the name/value property.	Integer	4
value	Value of the property	Text	255
rank	One property could be described in this table with multiple rows. Rank is used to group related properties.	Integer	4

phenotype Table

This is where measurements, trait observations are stored for each observation unit in study.

Column Name	Description	Type	Size (bytes)
phenotype_id	Primary key. DB record id.	Integer	4
uniquename	Probably not used.	Text	255
name	Always same as observable_id in but text form. TODO insert explanation from Graham's email here.	Text	255
observable_id	Foreign key to cvterm table which represents the variable (trait) being measured.	Integer	4
attr_id	Probably not used	Integer	4
value	Value of the trait or characteristic observed.	Text	255
cvalue_id	Cvterm of categorical variable value	Integer	4
assay_id	Probably not used	Integer	4
nd_experiment_id	Foreign key to nd_experiment table	Integer	4
created_date	Creation Date	TIMESTAMP
updated_date	Modification Date	TIMESTAMP
draft_value	Draft value of the trait or characteristic observed.	Text	255
draft_cvalue_id	Cvterm of categorical variable value	Integer	4
created_by	ID of user who created the phenotype observation	Integer	4
updated_by	ID of user who updated the phenotype observation	Integer	4
json_props	Stores additional info of an observation in json format	LONGTEXT

phenotype_outlier Table

Column Name	Description	Type	Size (bytes)
phenotype_outlier_id	Primary key. DB record id.	Integer	4
phenotype_id	Foreign key to the phenotype table	Integer	4
value	Value of the outlying data for the phenotype.	Text	45
date_modified	Date last modified.	Integer	4

stock table

Stores information about the Germplasm material used in the study. Chado stock module wiki.

Column Name	Description	Type	Size (bytes)
stock_id	Primary key. DB record id.	Integer	4
dbxref_id	Foreign key to the germplsm table	Integer	4
organism_id	Probably unused.	Integer	4
uniquename	Entry # of the germplasm in the list used for study	Text	255
description	Probably unused.TODO Confirm.	Text	255
is_obsolete	Probably unused. TODO Confirm.	Tiny Int	1
project_id	Foreign key to table project. The study record the stock belongs to	Integer	11
cross_value	Cross for the germplasm in the entry	Text	5000

stockprop table

Name/Value pair style propertied associated with a study germplasm.

Column Name	Description	Type	Size (bytes)
stockprop_id	Primary key. DB record id.	Integer	4
stock_id	Foreign key to the stock table.	Integer	4
type_id	Foreign key to the cvterm table, representing the "name" part of the property	Integer	4
value	Value of the stock property	Text	255
cvalue_id	Term id of the categorical value	Integer	11

nd_geolocation Table

Column Name	Description	Type	Size (bytes)
nd_geolocation_id	Primary key. DB record id.	Integer	4
description	Description of the geographic location	Text	255
latitude	Latitude.	Float	4
longitude	Longitude.	Float	4
geodetic_datum	Geodetic datum.	Text	32
altitude	Altitude.	Float	4

nd_geolocationprop Table

Name/Value pair style propertied associated with a study location/instance.

Column Name	Description	Type	Size (bytes)
nd_geolocationprop_id	Primary key. DB record id.	Integer	4
nd_geolocation_id	Foreign key to the nd_geolocation table.	Integer	4
type_id	Foreign key to the cvterm table, representing the "name" part of the property	Integer	4
value	Value of the geolocation property	Text	255
rank	Groups related geo location properties	Integer	4

sample_list Table

This contains information about a list of sampled plants in study.

Column Name	Description	Type	Size (bytes)
list_id	Primary key. DB record id.	Integer	4
list_name	Name of the list	Text	100
description	Title or description of the list	Text	255
hierarchy	Parent folder ID of the list	Integer	4
created_date	Date when list was created	Timestamp
notes	Additional information about the list	Text
created_by	ID of user who created the list	Integer	4
type	Type of List	Text	30
program_uuid	Unique ID of the program record in workbench_project table in workbench database	Text	36

sample Table

This contains information plants sampled in study.

Column Name	Description	Type	Size (bytes)
sample_id	Primary key. DB record id.	Integer	4
entry_no	Entry Number in sampled list	Integer	4
sample_name	Name of sample	Text	100
taken_by	ID of user who took sample	Integer	4
sampling_date	Date sample was taken	Date
created_date	Timestamp when record was created	Timestamp
sample_bk	Unique ID of the sample	Text	45
sample_list	ID of sample list it belongs to	Integer	4
created_by	ID of user who created sample	Integer	4
plate_id		Text	255
well		Text	255
nd_experiment_id	ID of observation unit the sample belongs to	Integer	4
sample_no	Sample Number within each related observation unit	Integer	4

study_type Table

This contains values for different types of study.

Column Name	Description	Type	Size (bytes)
study_type_id	Primary key. DB record id.	Integer	4
label	Description of study type	Text	45
name	Name of Study type	Text	45
cvterm_id	ID in cvterm table of record when study types still existed in ontology	Integer	4
visible	Indicator if study type is visible in system	TINYINT	4

dataset_type Table

This contains values for different types of dataset.

Column Name	Description	Type	Size (bytes)
dataset_type_id	Primary key. DB record id.	Integer	4
name	Name of dataset type	Text	100
description	Description of dataset type	Text	1024
cvterm_id	ID in cvterm table of record when dataset types still existed in ontology	Integer	4
is_subobs_type	Indicator if dataset type is sub-observation dataset type	TINYINT	4
is_obs_type	Indicator if dataset type is observation dataset type	TINYINT	4

germplasm_study_source Table

Store relationships between crosses/advances and its source experimental unit and study.

Existing cross lists and advance lists were inserted into this table, however with only the source study(project_id column) populated. The source experimental unit (nd_experiment_id column) could not be deduced and hence is empty.

Column Name	Description	Type	Size (bytes)
germplasm_study_source_id	Primary key. DB record id.	Integer	4
gid	Foreign Key to the germplasm table	Integer	4
project_id	Foreign key to the project table	Integer	4
nd_experiment_id	Foreign key to the nd_experiment table	Integer	4
type	Germplasm Study Source Type: ADVANCE/CROSS	Text	45