Overview

This page describes the workflow for creating a new dataset release for a specific species.

Dataset releases are species-specific and version-specific. This page makes frequent use of the tokens SPECIES and RELEASE for these values. For more information on token values, please see the page Tokens & Variables.

Sections

Data Locations: The set of files involved in a single release, as defined in the codebase.
Release Page: Notes on what parts of the release page are rendered from what files.
Deploying a New Release: Instructions for creating a new dataset release.
Pinned Releases: Notes on what site features may remain pinned to an older release's data files.

Data Locations

This section describes the data available through the individual release pages.

Release Version 2

All files are held in the Dataset Release bucket, under the specified release folder. For more information, see the V2 instance of the ReportType class in the dataset release file (code link).

Report Files

File Name	Filepath
`release_notes`	`release_notes_v2.md`
`summary`	`summary.md`
`methods`	`methods.md`
`alignment_report`	`alignment_report.html`
`gatk_report`	`gatk_report.html`
`concordance_report`	`concordance_report.html`

Divergent Regions

File Name	Filepath
`divergent_regions_strain_bed_gz`	`browser_tracks/{RELEASE}_{SPECIES}_divergent_regions_strain.bed.gz`
`divergent_regions_strain_bed`	`browser_tracks/{RELEASE}_{SPECIES}_divergent_regions_strain.bed`

Filters

File Name	Filepath
`soft_filter_vcf_gz`	`variation/WI.{RELEASE}.soft-filter.vcf.gz`
`soft_filter_vcf_gz_tbi`	`variation/WI.{RELEASE}.soft-filter.vcf.gz.tbi`
`soft_filter_isotype_vcf_gz`	`variation/WI.{RELEASE}.soft-filter.isotype.vcf.gz`
`soft_filter_isotype_vcf_gz_tbi`	`variation/WI.{RELEASE}.soft-filter.isotype.vcf.gz.tbi`
`hard_filter_vcf_gz`	`variation/WI.{RELEASE}.hard-filter.vcf.gz`
`hard_filter_vcf_gz_tbi`	`variation/WI.{RELEASE}.hard-filter.vcf.gz.tbi`
`hard_filter_isotype_vcf_gz`	`variation/WI.{RELEASE}.hard-filter.isotype.vcf.gz`
`hard_filter_isotype_vcf_gz_tbi`	`variation/WI.{RELEASE}.hard-filter.isotype.vcf.gz.tbi`
`impute_isotype_vcf_gz`	`variation/WI.{RELEASE}.impute.isotype.vcf.gz`
`impute_isotype_vcf_gz_tbi`	`variation/WI.{RELEASE}.impute.isotype.vcf.gz.tbi`

Filter Trees

File Name	Filepath
`hard_filter_min4_tree`	`tree/WI.{RELEASE}.hard-filter.min4.tree`
`hard_filter_min4_tree_pdf`	`tree/WI.{RELEASE}.hard-filter.min4.tree.pdf`
`hard_filter_isotype_min4_tree`	`tree/WI.{RELEASE}.hard-filter.isotype.min4.tree`
`hard_filter_isotype_min4_tree_pdf`	`tree/WI.{RELEASE}.hard-filter.isotype.min4.tree.pdf`

Haplotypes

File Name	Filepath
`haplotype_png`	`haplotype/haplotype.png`
`haplotype_pdf`	`haplotype/haplotype.pdf`
`sweep_pdf`	`haplotype/sweep.pdf`
`sweep_summary_tsv`	`haplotype/sweep_summary.tsv`

Transposons

File Name	Filepath
`transposon_calls`	`{RELEASE}_{SPECIES}_transposon_calls.bed`

Release Page

This section describes how the release data is used to render the release page tabs.

Release Notes

The "Release Notes" section is rendered from the Markdown file release_notes, and the "Release Summary" section is rendered from the Markdown file summary. For the locations of these files, see the section Report Files.

NOTE: In the Release Summary, the "Genome" value should correspond with the GENOME token for this release. For more information on tokens, please see the page Tokens & Variables.

Other Tabs

Methods

Rendered from the methods Markdown file. For the location of this file, see the section Report Files.

Alignment Summary

Rendered from the alignment_report Markdown file. For the location of this file, see the section Report Files.

Variant Summary

Rendered from the gatk_report Markdown file. For the location of this file, see the section Report Files.

Concordance

Rendered from the concordance_report Markdown file. For the location of this file, see the section Report Files.

Haplotypes

Rendered from the haplotype_png and haplotype_pdf files. For the locations of these files, see the section Haplotypes.

Swept Haplotypes

Rendered from the sweep_pdf and sweep_summary_tsv files. For the locations of these files, see the section Haplotypes.

Species Tree

Rendered from the hard_filter_isotype_min4_tree_pdf file. For the location of this file, see the section Filter Trees.

Deploying a New Release

This section describes how to create and publish a new dataset release.

NOTE: As of March 2025, not all tasks are automated yet! Parts of this flow changed during the site-v2 development cycle, and a few steps must be performed manually. Further development might change this.

Pre-Requisites

To deploy a new release, you will need:

Admin access to the CaeNDR site
Access to the datastore back-end (GCP)

The datastore access is required to manually fill out a few fields & make some updates that have not yet been integrated into the automated new release flow.

Instructions

Upload all relevant files to the new release bucket. For more information on required files, see the section Data Locations and/or consult the spreadsheet of required data.
Log in to the site as an admin user and navigate to the Admin portal.
Under the section "Content Updates", select "Update 'Download Data' Releases". (This may change to a new name.)

Click "Create Release", and fill out the form:

Field	Value
Dataset Release Version	The `RELEASE` value for this new release. See the page Tokens & Variables for details on the appropriate value. NOTE: Remember this value for later steps!
Wormbase Version	The `GENOME` value for this new release. See the page Tokens & Variables for details on the appropriate value.
Report Type	Select `V2` (or otherwise the most recent version).
Disabled; Hidden	If you wish to keep the release hidden from public users, you may select one of these.

When you are finished, click "Save".

In the GCP datastore back-end, navigate to "Datastore Studio", and select the database for this project.

Query by the kind dataset_release, and locate the release you just created - you should be able to find it by the fields version and/or created_on. Here, you will have to add a few additional fields:

Field	Value
`browser_tracks`	The list of tracks to make available in the Genome Browser tool. Unless this list has changed, you can copy the value from the previous release OF THE CURRENT SPECIES.
`genome`	The `GENOME` release value. See the page Tokens & Variables for details on the appropriate value.
`species`	The `SPECIES` release value. See the page Tokens & Variables for details on the appropriate value.

When you are finished, save the new entity.

Query by the kind species, and locate the entity representing the species for the new release.
Update the species entity's release_latest value to the RELEASE value of the new release, i.e. the value in the release's version field. If the species entity has a latest_release field, update it to the new release value as well.
- The value release_latest is the one that is used across the site.
- The value latest_release is a legacy value, and can likely be dropped, but I include it here for the sake of completeness.
If applicable, update the species entity's other release_ fields as well:
- If the Pairwise Indel Finder tool should switch over to the new release as well, update the release_pif value to the new version; otherwise, leave the value as-is. For more information, see the section Pairwise Indel Finder Release.
- If the Strain Variant Annotation tool should switch over to the new release as well, update the release_sva field (and the legacy sva_ver field, if it exists) to the new version; otherwise, leave the value(s) as-is. For more information, see the section Strain Variant Annotation Release.
When you are finished, save the species entity.

Pinned Releases

A few of the site features may be pinned to older releases, i.e. they may continue to use data from a previous release until that data is ready for the most recent release.

Pairwise Indel Finder Release

The Pairwise Indel Finder tool uses the value release_pif. This is used to determine the BED and VCF files used in the display & operation of the tool, as well as the list of available strains.

Strain Variant Annotation Release

The Pairwise Indel Finder tool uses the value release_sva. In older versions of the codebase, this field was called sva_ver.

This may be used to determine which SVA_CSVGZ file to use when building the strain_variant_annotation SQL table, by using the token SVA in the filename template. For more on this variable, see the Strain Variant Annotation section of the Data Dependencies page.

Dataset Releases - AndersenLab/CAENDR GitHub Wiki

Overview

Sections

Data Locations

Release Version 2

Report Files

Divergent Regions

Filters

Filter Trees

Haplotypes

Transposons

Release Page

Release Notes

Other Tabs

Methods

Alignment Summary

Variant Summary

Concordance

Haplotypes

Swept Haplotypes

Species Tree

Deploying a New Release

Pre-Requisites

Instructions

Pinned Releases

Pairwise Indel Finder Release

Strain Variant Annotation Release

⚠️ GitHub.com Fallback ⚠️

Dataset Releases - AndersenLab/CAENDR GitHub Wiki

Overview

Sections

Data Locations

Release Version 2

Report Files

Divergent Regions

Filters

Filter Trees

Haplotypes

Transposons

Release Page

Release Notes

Other Tabs

Methods

Alignment Summary

Variant Summary

Concordance

Haplotypes

Swept Haplotypes

Species Tree

Deploying a New Release

Pre-Requisites

Instructions

Pinned Releases

Pairwise Indel Finder Release

Strain Variant Annotation Release

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️