Strains & Isotypes - AndersenLab/CAENDR GitHub Wiki
This page describes the site features:
The source of truth for the set of strains is a set of Google Sheets documents, one for each species on the site. These documents are used to build a SQL table named strains
, which in turn populates the strain queries across the site.
Uses of the strains dataset include:
- Populating the strains map
- Populating the isotype tables & individual pages
- Populating the Genome Browser strain selector
- etc.
The Google Sheets document IDs are specified with a set of GCP secret values, one for each species:
ANDERSEN_LAB_STRAIN_SHEET_{ SPECIES }
This is the part of the URL that identifies the document, e.g. the URL will be something like:
https://docs.google.com/spreadsheets/d/{ SHEET_ID }
Note that data entered in the Google Sheets are not immediately made public; instead, the strains
SQL table must be built first. See the section below for details.
For more information on these Google Sheets, see the sections Lab Strain Data and SQL Database Source Files of the page Data Dependencies. For more information on how this table is used and how to (re)build it, please consult the page Managing the SQL Database.
Once the Google Sheets are set up, their data can be published to the site by rebuilding the Strains table.
To (re)build the SQL table, follow the instructions in the section Building SQL Tables of the page Managing the SQL Database.
NOTE: As of March 2025, the workflow to build the strains
table IGNORES any rows that do not have a release
value - these are treated as "unreleased". If a strain does not appear in the table after a rebuild, please confirm that it has a valid release
value.
Optionally, one or more photographs may be associated with a strain or isotype. To upload pictures, see the section Strain Photos of the page Data-Dependencies.
Front-end users may download the strains list from the Dataset Releases page. This generates a file on-the-fly by dumping the data in the SQL table.
As of March 2025, this is generated on the route /download/<species_name>/<release_name>/strain-data/<file_ext>
.
NOTE: As of March 2025, older dataset release pages do NOT filter out newer strains, i.e. there is no such thing as downloading a "historical" strain list. I believe the assumption is that end-users may apply this filter themselves on the downloaded dataset.
A few quick notes on features of the site that are populated from the strains list.
- The Genome Browser tool pulls its list of strains from the
strains
table. - The Pairwise Indel Finder tool does NOT pull its list of strains from the
strains
table; instead, it uses the BED and VCF files provided in the pinned release (release_pif
). For more information, see the section Pairwise Indel Finder Release of the page Dataset Releases.
- If a strain does not appear in the right position on the map, double-check the coordinates recorded in the strain sheet. For example, try adding or removing a negative sign from one of the latitude & longitude - this can change the location pretty significantly.
Front-end user strain submission is handled through Google Forms. The relevant form is linked in the top navbar of the site, and should be under the lab's ownership. Users are linked out to the Form, so none of this is handled in this codebase.
Front-end users may request strains through the "Request Strains" workflow. All values in the relevant tables are pulled from the strains
table.