Background - ncbi/workshop-asm-ngs-2022 GitHub Wiki

SRA and Associated Repositories

SRA

SRA accepts sequence data and the associated quality scores produced by high throughput sequencing technologies. Each SRA record (Run) is linked to 1 BioSample and 1 BioProject.

The default file format is known as SRA Normalized, and uses the extension *.sra. The SRAToolkit can be used to extract common file formats such as *.sam and *.fastq from these files. The user submitted files are referred to as source files and use common bioinformatic file format extensions such as *.bam or *.fastq. Lastly, we offer a file type known as SRA Lite, using the extension *.sralite which functions like the SRA Normalized format save that it uses a reduced range of quality scores, thus affording savings in file sizes.

Biosample

A BioSample contains descriptive information about the physical biological specimen from which your experimental data are derived. A single BioSample may be linked to many sequence records derived from the same source material.

Bioproject

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium of coordinating organizations.

Taxonomy

NCBI Taxonomy includes scientific names and other associated names for all organisms for which sequence data has been submitted to NCBI along with names for associated higher-order taxa, as well as the hierarchical relationship between these names.

SARS-CoV-2 Variant Calling

As part of the NIH ACTIV TRACE Initiative, NCBI has developed a variant calling pipeline and generated VCF files for Illumina, Oxford Nanopore, and PacBio SARS-CoV-2 data submitted to SRA. These results are available as part of our cloud resources.

NCBI Cloud Content

SRA data is available on the Google Cloud Platform (GCP) and Amazon Web Services (AWS) clouds. To the extent possible, NCBI aims to provide equal support for use of both of these cloud service providers. While today's workshop will focus on GCP, everything should be equally possible to perform from AWS.

⚠️ **GitHub.com Fallback** ⚠️