Add a dataset - legumeinfo/ZZBrowse GitHub Wiki
How to add a dataset
Not necessarily in this order,
-
Make sure that your annotations file exists on the LIS data store.
-
Make sure your genetic marker files (in .gff3.gz and .gff3.gz.tbi format), and raw GWAS and/or QTL data files (in .tsv.gz format), exist in the LIS data store. ZZBrowse will use these to generate datasets from your raw data.
-
To do: more thorough explanation of the data generation process ("Poor man's DSCensor").
- Run the first part of datasets.R to scan the data store to create lists of GWAS and QTL data files and marker files.
- The second part of datasets.R generates tables of traits and their ontology codes.
- Back up, then delete any outdated ZZBrowse GWAS and QTL datasets from www/config/data
- Launch ZZBrowse (or restart shiny-server), this detects that the files from step 3 no longer exist, and regenerates them from the data store. This can take a long time, but only needs to be done once (per data update or disaster).
- Also run combine-gwas-qtl.R for each species to generate the combined GWAS-QTL datasets.
-
If your organism file does not already exist, create it in the organisms subdirectory.
Line 1 - the organism display name
Line 2 - its chromosome lengths, either numeric or in the form name:length
Line 3 - forms of the organism name: Genus species,G.species,Gensp
Line 4 - URL or local file path of the annotations file (from step 1)
Line 5 - full chromosome name format, as in the annotations file. ZZBrowse will automatically create the short display format and matching regex from this.
Line 6 - base URL for Services API genomic linkage queries
Line 7 - tags for constructing annotations table: strand column name, forward strand code, reverse strand code, start-of-gene column name, end-of-gene column name, URL format for returning gene links, gene id column name (to plug into URL format), gene name column name, chromosome column name, gene description column name -
In www/config/datasetProperties.csv, add a line for each of your new GWAS and/or QTL datasets.
dataset = the dataset's display name.
chrColumn = which column in the dataset contains the chromosome name. Note that this must begin with "chr" (case-insensitive).
bpColumn = which column contains the SNP position (for GWAS data) or interval center position (for QTL data).
traitCol = which column contains the trait or phenotype.
yAxisColumn = which column contains the p-value (or other significance value or score).
logP = whether to use -log10(yAxisColumn) in the charts (generally TRUE for p-values, FALSE for others).
axisLim = whether to specify hard y-axis limits on the charts (always FALSE for our data).
axisMin = hard bottom of y-axis (or 0 if axisLim = FALSE).
axisMax = hard top of y-axis (or 1 if axisLim = FALSE).
organism = the species to which the dataset refers.
plotAll = whether all data are for the same trait (probably always FALSE for our data).
supportInterval = whether to support interval data, as for QTL data. Set the remaining columns to something meaningful if supportInterval is TRUE:
SIyAxisColumn = which column contains the significance value for interval data ("val" for those we generate on the fly).
SIbpStart = which column contains the start position for interval data.
SIbpEnd = which column contains the end position for interval data.
SIaxisLimBool = whether to specify hard y-axis limits for interval data (always FALSE for our data).
SIaxisMin = hard bottom of interval y-axis (or 0 if SIaxisLimBool = FALSE).
SIaxisMax = hard bottom of interval y-axis (or 1 if SIaxisLimBool = FALSE). -
Tell ZZBrowse where to find your data:
buildGWAS.R - add its lis.datastore.info
buildQTL.R - add its lis.datastore.info
server.R - add it to lis.datastore.gwas or lis.datastore.qtl
For GWAS data that live elsewhere than the data store: in buildGWAS.R, add any remote GWAS URLs, specify their column names, and do any special handling.
- Other notes
buildQTL.R needs no p-value column as it automatically generates a column of 0s and dynamically assigns the y-value for the QTL bars.
Combined GWAS-QTL datasets: use combine-gwas-qtl.R after generating the GWAS and QTL datasets.
Also to do: investigate eliminating legumeInfo.organisms (unused?) from server.R