Sample data - rrwick/Perfect-bacterial-genome-tutorial GitHub Wiki

S. aureus JKD6159

The sample data for this tutorial is from the S. aureus JKD6159 genome. This genome contains a 2.8 Mbp chromosome and 21 kbp plasmid. Read more about it here:

If you use this sample data in your research, please cite:

Figshare

The sample data is hosted on figshare: bridges.monash.edu/articles/dataset/S_aureus_JKD6159_sequencing_data/21007033

There you will find:

  • A reference assembly
  • Illumina reads
  • ONT R10.4 reads (raw and basecalled)
  • ONT R9.4.1 reads (raw and basecalled)
  • PacBio RSII reads (raw and basecalled)

The easy and medium tutorials assume you have the reference assembly and Illumina and ONT R10.4 reads in FASTQ format. Here are commands to download these files:

mkdir reads reference
wget --no-check-certificate -O reference/S_aureus_JKD6159.fasta https://bridges.monash.edu/ndownloader/files/37312027
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_1.fastq.gz https://bridges.monash.edu/ndownloader/files/37312789
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_2.fastq.gz https://bridges.monash.edu/ndownloader/files/37312840
wget --no-check-certificate -O reads/S_aureus_JKD6159_ONT_R10.4_guppy_v6.1.7.fastq.gz https://bridges.monash.edu/ndownloader/files/37317376

The hard tutorial is more flexible: you can use different long reads (e.g. ONT R9.4.1 or PacBio RSII) or raw long reads (so you can do the basecalling yourself) or even your own data from a different genome.

NCBI

In case the figshare links aren't working, the sample data is also available on NCBI: