Tokens

Tokens are changing values that may be used in strings. They are typically used in filenames / filepaths, to distinguish e.g. between data files for different species or different releases.

For example, some filepaths may be given in the format:

... /{ SPECIES }/{ RELEASE }/sample_file_name-{ SPECIES }-{ RELEASE }.txt

If we wanted to use the version of this file associated with C. elegans from the (made-up) CaeNDR release on Jan 1, 2000, we would look for the file:

... /c_elegans/20000101/sample_file_name-c_elegans-20000101.txt

Token List

Token	Description	Values
`SPECIES`	The relevant Caenorhabditis species for a given dataset or operation. The (unofficial) format is genus initial + underscore + species name -- in our case, this means all valid values begin with `c_`.	`c_elegans`, `c_briggsae`, `c_tropicalis`
`RELEASE`	The CaeNDR release that a data file was released in. New values added when new dataset releases are created.	"YYYYMMDD" format
`SVA`	The CaeNDR release to use for the Strain Variant Annotation table & tool. This has the same format (and possible values?) as the `RELEASE` token, but is kept separate because this tool may lag behind the most recent CaeNDR release - i.e. the `SVA` release and the "main" release may be different.	"YYYYMMDD" format, same as `RELEASE`
`GENOME`	An attached to a given release, relevant for locating the correct FASTA file. I believe this is now a more generic version of the `WB` token, since CaeNDR expanded from using specifically WormBase data to also using lab-produced genomes. Typically 1-to-1 with `RELEASE` values, but (a) it is possible for multiple CaeNDR releases to use the same genome, and (b) the genome value may be lab-internal or lab-specific, i.e. have some meaning within the lab that isn't captured by the CaeNDR release value alone.	Could be anything - `WS276` (from WormBase), `Feb2020` (from Andersen Lab), etc.
`STRAIN`	An identifier for a specific strain. Mostly relevant for the Genome Browser tool's IGV Browser element, which needs to pull files for specific strains.	`BRC20067`, etc.
`USER_ID`	The unique internal ID of a CaeNDR user who submitted a given data file, e.g. a phenotype trait file.	GCP Datastore Key
`PRJ`	WormBase project number associated with a dataset or release. Pulled from WormBase. May not be relevant going forward, with new CaeNDR release format.	`PRJNA13758`, etc.
`WB`	WormBase version associated with a dataset or release. Pulled from WormBase. May not be relevant going forward, with new CaeNDR release format.	`WS276`, etc.

A few notes:

Probably 85~90% of the time, you only need to care about the SPECIES and RELEASE tokens. All the others are pretty niche and context-specific; the SVA token, for example, is pretty much only relevant to the Strain Variant Annotation tool.
The SVA token may be referred to with RELEASE, if the Strain Variant Annotation versioning scheme isn't relevant from a particular perspective. (This can be a bit confusing, but also, it's pretty infrequent that we actually care about this difference.)
The GENOME token value is relevant when creating a new dataset release, and will appear on the Dataset Release page. It's mostly used to associate lab-internal versioning with CaeNDR release versioning. It's possible for multiple releases to use the same GENOME value, if they're based on the same FASTA genome file.
The tokens PRJ and WB are holdovers from an older dataset versioning system, and don't appear to be very relevant going forward. It may be helpful to know what they meant, though, if dealing with old data.

Site Environment Variables

This is NOT an extensive list of all environment variables required to use the site! Rather, this is an outline of most of the "custom" environment variables defined & used by the CaeNDR source code, specifically for running the site.

All environment variables here are directly read & used in the source code itself.

Module Configuration

Variable	Description	Type
`MODULE_{NAME}_CONTAINER_NAME`	The name of the Docker container to look for to access this module.	`string`
`MODULE_{NAME}_CONTAINER_VERSION`	The version tag to look for on the Docker container.	`string`

Note that the Image Thumbnail Generator module uses a slightly different format:

Variable	Description	Type
`MODULE_IMG_THUMB_GEN_SOURCE_PATH`	The path in GCP where images are held.	`string`
`MODULE_IMG_THUMB_GEN_VERSION`	The version tag to look for on the Docker container. (Same as above)	`string`

Containerized Tool Configuration

See the tools section below.

GCP Bucket Names

See (buckets page?)

Variable	Description	Type
`MODULE_SITE_BUCKET_PHOTOS_NAME`	The "photos" bucket name.	`string`
`MODULE_SITE_BUCKET_ASSETS_NAME`	The "assets" bucket name.	`string`
`MODULE_SITE_BUCKET_PRIVATE_NAME`	The "private" bucket name.	`string`
`MODULE_SITE_BUCKET_PUBLIC_NAME`	The "public" bucket name.	`string`
`MODULE_DB_OPERATIONS_BUCKET_NAME`	The "db" bucket name.	`string`
`MODULE_SITE_BUCKET_DATASET_RELEASE_NAME`	...	`string`
`ETL_LOGS_BUCKET_NAME`	The "logs" bucket where database operation logs are uploaded. Might be obsolete?	`string`

Override:

Variable	Description	Type
`MODULE_SITE_BUCKET_PUBLIC_NAME_OVERRIDE`	...	`string`

Filepaths & Filenames

All(?) filepaths and filenames are handled as "tokenized strings", i.e. strings that may contain one or more "Token" values (see above).

BAM/BAI Files

Variable	Description	File Type	Type
`BAM_BAI_PREFIX`	The location of the BAM and BAI files within the private bucket.	-	`string` (tokenized)
`BAM_BAI_DOWNLOAD_SCRIPT_NAME`	The filename to download the "download BAM/BAI files" script with; that is, the filename that the user will see when they generate this file & download it.	Bash (`.sh`)	`string` (tokenized)

FASTA Files

Variable	Description	Variable Type
`FASTA_FILENAME_TEMPLATE`	Name of the FASTA file for a given species & release, NOT including the file extension. This is because the full file and the index file share the same name.	`string` (tokenized)
`FASTA_EXTENSION_FILE`	The filename extension for the full FASTA file. Typically `.fa`.	`string`
`FASTA_EXTENSION_INDEX`	The filename extension for the FASTA index file. Typically `.fa.fai`. Note - NOT appended to the plain file extension, so they can be different.	`string`

SQL Table Source Files

Files used primarily to build the SQL database tables. These are all read as tokenized strings, in case the filenames need to change for species / release / etc, but as of March 2025, only the SVA_CSVGZ filename actually uses any tokens.

Filepaths:

Variable	Description	Variable Type
`MODULE_DB_OPERATIONS_RELEASE_FILEPATH`	The filepath in the `DB_OPERATIONS` bucket containing gene files below.	`string` (tokenized)
`MODULE_DB_OPERATIONS_SVA_FILEPATH`	The filepath in the `DB_OPERATIONS` bucket containing the SVA file below.	`string` (tokenized)
`MODULE_DB_OPERATIONS_PHENOTYPE_FILEPATH`	deprecated?	`string` (tokenized)
`MODULE_DB_OPERATIONS_TRAITFILE_PUBLIC_FILEPATH`	The path in the `DB_OPERATIONS` bucket containing the user-uploaded phenotype trait files.	`string`

Filenames:

Variable	Description	File Path	File Type	Type
`GENE_GFF_FILENAME`	Name of the file used to build the "Wormbase Gene Summary" table.	`MODULE_DB_OPERATIONS_RELEASE_FILEPATH`	Zipped GFF (`.gff3.gz`)	`string` (tokenized)
`GENE_GTF_FILENAME`	Name of one file used to build the "Wormbase Gene" table.	`MODULE_DB_OPERATIONS_RELEASE_FILEPATH`	Zipped GTF (`.gtf.gz`)	`string` (tokenized)
`GENE_IDS_FILENAME`	Name of one file used to build the "Wormbase Gene" table.	`MODULE_DB_OPERATIONS_RELEASE_FILEPATH`	Zipped Text (`.txt.gz`)	`string` (tokenized)
`SVA_CSVGZ_FILENAME`	Name of the file used to build the "Strain Variant Annotation" table.	`MODULE_DB_OPERATIONS_SVA_FILEPATH`	Zipped CSV (`.csv.gz`)	`string` (tokenized)

For more information on how these files are used, please consult SQL Database Source Files.

Miscellaneous Files

Other files used to populate the site.

Variable	Description	File Type	Type
`EULA_FILE_NAME`	The file containing the site's End-User License Agreement.	Markdown (`.md`)	`string`

URLs

Project-Internal URLs

Variable	Description	Type
`MODULE_SITE_HOST`	The root URL that the CaeNDR site should be hosted on.	`string`

Project-External URLs

Variable	Description	Type
`MODULE_SITE_STRAIN_SUBMISSION_URL`	URL for Google Sheet tracking user-submitted strains.	`string`
`SENTRY_URL`	The ingest URL with Sentry for tracking site bugs / errors.	`string`

Misc

Variable	Description	Type
`MODULE_SITE_PASSWORD_PROTECTED`	Whether to request a password to access the site at all. Relevant for QA site.	`boolean`
`MODULE_SITE_CART_COOKIE_NAME`	Name for the cookie to use to store the user's cart, if they are requesting strains.	`string`
`MODULE_SITE_CART_COOKIE_AGE_SECONDS`	Timeout until the cart cookie expires, in seconds.	`int`
`MODULE_SITE_PASSWORD_RESET_EXPIRATION_SECONDS`	Timeout until a "password reset" link expires, in seconds.	`int`
`USER_OWNED_ENTITY_CACHE_AGE_SECONDS`	Timeout for the local cache of User entities, specifically when loading entities that are owned by users. Makes big queries where many objects have the same users much more efficient (e.g. pulling up all of a user's generated reports).	`int`

Containerized Tool Variables

General Tool Variables

These are variables that (mostly) exist for all three tools. The actual situation is a bit more complicated (because it always is), but as a rule of thumb, these are useful for all the tools.

As a quick refresher, the tools are:

Short Name	Display Name (on CaeNDR site)	Tool Name code
Nemascan	Genetic Mapping	`NEMASCAN` (sometimes `NEMASCAN_NXF`)
Heritability	Heritability Calculator	`HERITABILITY`
Indel Primer, Indel Finder	Pairwise Indel Finder	`INDEL PRIMER`

Variable	Description	Type	Notes
`{TOOL_CODE}_CONTAINER_NAME`	The name of the Docker Container for the relevant tool.	`string`	For historical reasons, the Nemascan variable is actually prefixed with `NEMASCAN_NXF`, instead of just `NEMASCAN`.
`{TOOL_CODE}_TASK_QUEUE_NAME`	The name of the GCP Cloud Task queue that handles job submissions for this tool.	`string`
`{TOOL_CODE}_EXAMPLE_FILE`	The filepath / filename of the example data file for this tool, made available on the CaeNDR site as a sample / template for users to check against before uploading their data. Species-specific.	`string` (tokenized)	Not necessary for Indel Finder.

Nemascan

Variables specific to the Nemascan ("Genetic Mapping") tool.

Variable	Description	Type
`NEMASCAN_SOURCE_GITHUB_ORG`	The GitHub organization to pull the Nemascan code from. Used when pushing new versions of the image, in the `nemascan-proxy` module.	`string`
`NEMASCAN_SOURCE_GITHUB_REPO`	The GitHub repository in the above organization to pull the Nemascan code from. Used when pushing new versions of the image, in the `nemascan-proxy` module.	`string`

The tag to publish with is pulled from the command line.

Heritability

Variables specific to the Heritability Calculator tool.

Variable	Description	Type
`HERITABILITY_CONTAINER_VERSION`	The Docker image version tag to use when pushing new versions of the Heritability tool. Used in the `heritability_proxy` module.	`string`

Pairwise Indel Finder

Variables specific to the Pairwise Indel Finder (or "Indel Primer") tool.

Variable	Description	Type
`INDEL_PRIMER_TOOL_PATH`	The GCP path to pull Indel Finder static data files from (BED, VCF, index files, etc). Located in the private bucket.	`string`
`INDEL_PRIMER_SOURCE_FILENAME`	The naming schema for the BED and VCF files. Omits the file extension, since there are multiple files if different types using this same name.	`string` (tokenized)

Google Cloud Platform Variables

Variables used to configure GCP access. See GCP docs for details.

GCP Project Configuration

Token	Type
`GOOGLE_CLOUD_PROJECT_ID`	`string`
`GOOGLE_CLOUD_PROJECT_NUMBER`	`string`
`GOOGLE_CLOUD_REGION`	`string`
`GOOGLE_CLOUD_ZONE`	`string`
`GOOGLE_CLOUD_APP_LOCATION`	`string`

Google Datastore

Token	Type
`GOOGLE_STORAGE_SERVICE_ACCOUNT_NAME`	`string`

Google SQL

Token	Type
`GOOGLE_CLOUDSQL_SERVICE_ACCOUNT_NAME`	`string`

Google Analytics

Token	Type
`GOOGLE_ANALYTICS_SERVICE_ACCOUNT_NAME`	`string`
`GOOGLE_ANALYTICS_PROPERTY_ID`	`string`

Google Sheets

Token	Type
`GOOGLE_SHEETS_SERVICE_ACCOUNT_NAME`	`string`

Tokens & Variables - AndersenLab/CAENDR GitHub Wiki

Tokens

Token List

Site Environment Variables

Module Configuration

Containerized Tool Configuration

GCP Bucket Names

Filepaths & Filenames

BAM/BAI Files

FASTA Files

SQL Table Source Files

Miscellaneous Files

URLs

Project-Internal URLs

Project-External URLs

Misc

Containerized Tool Variables

General Tool Variables

Nemascan

Heritability

Pairwise Indel Finder

Google Cloud Platform Variables

GCP Project Configuration

Google Datastore

Google SQL

Google Analytics

Google Sheets

⚠️ GitHub.com Fallback ⚠️

Tokens & Variables - AndersenLab/CAENDR GitHub Wiki

Tokens

Token List

Site Environment Variables

Module Configuration

Containerized Tool Configuration

GCP Bucket Names

Filepaths & Filenames

BAM/BAI Files

FASTA Files

SQL Table Source Files

Miscellaneous Files

URLs

Project-Internal URLs

Project-External URLs

Misc

Containerized Tool Variables

General Tool Variables

Nemascan

Heritability

Pairwise Indel Finder

Google Cloud Platform Variables

GCP Project Configuration

Google Datastore

Google SQL

Google Analytics

Google Sheets

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️