Introduction - ncbi/workshop-asm-ngs-2022 GitHub Wiki
Introduction
Learning Objectives
- How to Access SRA data in GCP's BigQuery
- How to run common queries against SRA metadata in BigQuery
- Different methods for retrieving SRA data in the Cloud
- One approach to assessing reference genome coverage
- How to query SARS-CoV-2 SRA data using precalculated variant calling results in BigQuery
Background Knowledge
- General familiarity with writing SQL queries
- A google search for SQL basics will reveal a number of decent tutorials
- General familiarity with Next-Generation Sequencing (NGS) data
- General familiarty with variant calling and the VCF files format
- The VCF specification can be found here
Help Documentation
- General documention on finding and downloading SRA data can be found here
- Documentation on the SRA TAxonomy Analysis Tool (STAT) can be found here
- General documentation on SRA cloud resources can be found here
- Documentation on NCBI's SARS-CoV-2 Variant Calling Pipeline can be found here
- GCP BigQuery documentation can be found here
- Documentation on the SRAToolkit can be found at the associated GitHub page
- minimap2 documentation can be found here
- samtools documentation can be found here
- jq documentation can be found here
- gnuplot documentation can be found here
Other Resources
- The AWS Open Data Program can be found here
- Documentation on GCP Public Datasets can be found here
- AWS Athena documentation can be found here
- The initial publication on STAT can be found here