Pathogen Background - ncbi/workshop-asm-ngs-2022 GitHub Wiki

Background

NCBI Pathogen Detection integrates bacterial and fungal pathogen genomic sequences from numerous ongoing surveillance and research efforts whose sources include food, environmental sources such as water or production facilities, and patient samples. Foodborne, hospital-acquired, and other clinically infectious pathogens are included.

The system provides two major automated real-time analyses: 1) it quickly clusters related pathogen genome sequences to identify potential transmission chains, helping public health scientists investigate disease outbreaks, and 2) as part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences, which enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence.

In this workshop we will be looking at NCBI Pathogen Detection data in Google Cloud with an emphasis on the antimicrobial resistance data.

Learning Objectives

  • Demonstrate use of BigQuery in the Google Cloud Console and commandline bq
  • Show how BigQuery can be used to do analysis of microbigge, isolates, and isolate_exceptions tables and how they relate to the web interface
  • Demonstrate downloading sequences and phylogenetic analysis from the Reference Gene Catalog and visualization using iTOL
  • Demonstrate using gsutil to download MicroBIGG-E contig sequences from cloud storage buckets
  • Demonstrate the use of seqkit to perform some common operations on FASTA files
  • Show how to slice out coding sequences from contig sequences and perform simple selection analysis on genes in MicroBIGG-E

Help documentation

Background for workshop

Background PowerPoint


Continue to Project 1