Data Browser - stjude/proteinpaint GitHub Wiki
The Data Browser UI is designed to validate data architecture and values for custom cohorts.
The data browser UI accepts user inputs for a data dictionary. The app auto detects either a phenotree or terms table format.
Data requirements for both formats:
- All term_ids must be unique.
- Terms must not appear in different levels of the same parent branch. For example in the phenotree format, term C and term D in this branch
| Level_1 | Level_2 | Level_3 | Level_4 |
|---|---|---|---|
| term A | term B | term C | term D |
| Level_1 | Level_2 | Level_3 | Level_4 |
|---|---|---|---|
| term A | term C | term D | - |
A tab-delimited file with a header line. Each line is a variable. Following are allowed column headers. The header line is case-insensitive.
Required. Variable identifiers. Must be unique.
Required. Define the variable type. Must be one of: integer, float, categorical, survival, condition.
Optional. Creates the hierarchy (i.e. tree branches). Order left to right, from highest to lowest, no extraneous columns in between, and with the naming convention: Level_[##].
Optional. Value is stringified JSON object. Specify category labels, show/hide, etc. options for individual terms as a set of key-value pairs in the JSON object. Example: {"1":{"label":"Yes"}, "0":{"label":"No"}}
Applies to both categorical and numerical terms. For numerical term, it defines uncomputable categories.
Optional. Specify the unit for numeric variables. Only applies to numeric terms.
Options:
- Label: STR. Required. Value will be displayed as the label in the charts
- uncomputable: true/false. Optional. True removes from term from chart displays
Optional. Value is stringified JSON object. Any properties will be appended to the term object.
| Level_1 | Level_2 | Level_3 | Level_4 | Variable | type | Categories | Unit |
|---|---|---|---|---|---|---|---|
| Genomic Profiling Status | Whole Genome Sequencing | - | - | wgs_sequenced | categorical | {"1":{"label":"Yes"}} | |
| Genomic Profiling Status | SNP Array 6.0 | - | - | snp6_genotyped | categorical | {"1":{"label":"Yes"}, "0":{"label":"No"}} | |
| Cancer- related Variables | Treatment | Alkylating Agents, mg/m2 | Cyclophosphamide | cyclophosphamide_5 | float | {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown"} } | |
| Cancer- related Variables | Treatment | Alkylating Agents, mg/m2 | Cumulative Alkylating Agents | aaclassic_5 | float | {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown exposure"} } | |
| Genomic Profiling Status | Age (years) at SNP Array 6.0 sample collection | - | - | snp6_sample_age | integer | {"-994":{"label":"N/A:CCSS"}} | years |
Only use blanks or โ-โ for non applicable level columns. No blanks or โ-โ between levels.
Donโt:
| Level_1 | Level_2 | Level_3 |
|---|---|---|
| term A | term B | -- |
| -- | -- | term C |
The dashes in the second row will throw an error.
Do:
| Level_1 | Level_2 | Level_3 |
|---|---|---|
| term A | term B | - |
| term A | term B | term C |
Tab delimited file with the following columns:
Required.
Required. Immediate parent ID
Required. Label for the term
Required.
- non-graphible: applies to parent terms without values
- categorical: string or uncomputable values
- integer
- float
Optional. value labels or the term separated by a semicolon. E.g. 1=Yes; 2=No
Below is an example of a tab delimited data dictionary.
term_id parent_id name type values
gps root Genomic Profiling Status non graphable
wgs_sequenced gps Whole Genome Sequencing categorical 1=Yes;
snp6_genotyped gps Affymetrix Genome-Wide Human SNP Array 6.0 categorical 1=Yes; 0=No; -994=N/A: CCSS
wgs_curated gps Whole Genome Sequencing Curated Variant Calls categorical 1=Yes; -9999=Pending review
Data requirements:
The parent_id for grandparents at the start of the branch is root. For example, the parent ids for this branch: root, term_1, term_2, and term_3.
| term_id | parent_id | name |
|---|---|---|
| term_1 | root | Term 1 |
| term_2 | term_1 | Term 2 |
| term_3 | term_2 | Term 3 |
| term_4 | term_3 | Term 4 |
The user interface is available from the Data Browser card on our homepage. Submitting the dictionary file for a custom cohort displays a new UI with a suite of tools to explore the data. First the dictionary will appear.
The terms appear in a collapsible list, shown in the example below. The terms with white background are not linked to any data but show terms underneath by clicking on the โ+โ. Think of these terms as headers and subheaders for the collapsible list.
Terms shown as clickable blue pills are intended to link to data. The example below depicts a collapsible list with blue pills and white, non data linked terms.
**Clicking on a blue pill will show an error.