Sample data - rrwick/Perfect-bacterial-genome-tutorial GitHub Wiki

S. aureus JKD6159

The sample data for this tutorial is from the S. aureus JKD6159 genome. This genome contains a 2.8 Mbp chromosome and 21 kbp plasmid. Read more about it here:

Chua K, Seemann T, Harrison PF, Davies JK, Coutts SJ, Chen H, Haring V, Moore R, Howden BP, Stinear TP. Complete Genome Sequence of Staphylococcus aureus Strain JKD6159, a Unique Australian Clone of ST93-IV Community Methicillin-Resistant Staphylococcus aureus. Journal of Bacteriology. 2010. doi:doi.org/10.1128/JB.00878-10.
Monk IR, Tree JJ, Howden BP, Stinear TP, Foster TJ. Complete Bypass of Restriction Systems for Major Staphylococcus aureus Lineages. mBio. 2015. doi:10.1128/mBio.00308-15.

If you use this sample data in your research, please cite:

Wick RR, Judd LM, Monk IR, Seemann T, Stinear TP. Improved Genome Sequence of Australian Methicillin-Resistant Staphylococcus aureus Strain JKD6159. Microbiology Resource Announcements. 2023. doi: 10.1128/mra.01129-22.

Figshare

The sample data is hosted on figshare: bridges.monash.edu/articles/dataset/S_aureus_JKD6159_sequencing_data/21007033

There you will find:

A reference assembly
Illumina reads
ONT R10.4 reads (raw and basecalled)
ONT R9.4.1 reads (raw and basecalled)
PacBio RSII reads (raw and basecalled)

The easy and medium tutorials assume you have the reference assembly and Illumina and ONT R10.4 reads in FASTQ format. Here are commands to download these files:

mkdir reads reference
wget --no-check-certificate -O reference/S_aureus_JKD6159.fasta https://bridges.monash.edu/ndownloader/files/37312027
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_1.fastq.gz https://bridges.monash.edu/ndownloader/files/37312789
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_2.fastq.gz https://bridges.monash.edu/ndownloader/files/37312840
wget --no-check-certificate -O reads/S_aureus_JKD6159_ONT_R10.4_guppy_v6.1.7.fastq.gz https://bridges.monash.edu/ndownloader/files/37317376

The hard tutorial is more flexible: you can use different long reads (e.g. ONT R9.4.1 or PacBio RSII) or raw long reads (so you can do the basecalling yourself) or even your own data from a different genome.

NCBI

In case the figshare links aren't working, the sample data is also available on NCBI:

Reference assembly
Illumina reads
ONT R10.4 reads: raw, basecalled
ONT R9.4.1 reads: raw, basecalled
PacBio RSII reads: raw, basecalled