Sample data - rrwick/Perfect-bacterial-genome-tutorial GitHub Wiki
S. aureus JKD6159
The sample data for this tutorial is from the S. aureus JKD6159 genome. This genome contains a 2.8 Mbp chromosome and 21 kbp plasmid. Read more about it here:
- Chua K, Seemann T, Harrison PF, Davies JK, Coutts SJ, Chen H, Haring V, Moore R, Howden BP, Stinear TP. Complete Genome Sequence of Staphylococcus aureus Strain JKD6159, a Unique Australian Clone of ST93-IV Community Methicillin-Resistant Staphylococcus aureus. Journal of Bacteriology. 2010. doi:doi.org/10.1128/JB.00878-10.
- Monk IR, Tree JJ, Howden BP, Stinear TP, Foster TJ. Complete Bypass of Restriction Systems for Major Staphylococcus aureus Lineages. mBio. 2015. doi:10.1128/mBio.00308-15.
If you use this sample data in your research, please cite:
Figshare
The sample data is hosted on figshare: bridges.monash.edu/articles/dataset/S_aureus_JKD6159_sequencing_data/21007033
There you will find:
- A reference assembly
- Illumina reads
- ONT R10.4 reads (raw and basecalled)
- ONT R9.4.1 reads (raw and basecalled)
- PacBio RSII reads (raw and basecalled)
The easy and medium tutorials assume you have the reference assembly and Illumina and ONT R10.4 reads in FASTQ format. Here are commands to download these files:
mkdir reads reference
wget --no-check-certificate -O reference/S_aureus_JKD6159.fasta https://bridges.monash.edu/ndownloader/files/37312027
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_1.fastq.gz https://bridges.monash.edu/ndownloader/files/37312789
wget --no-check-certificate -O reads/S_aureus_JKD6159_Illumina_2.fastq.gz https://bridges.monash.edu/ndownloader/files/37312840
wget --no-check-certificate -O reads/S_aureus_JKD6159_ONT_R10.4_guppy_v6.1.7.fastq.gz https://bridges.monash.edu/ndownloader/files/37317376
The hard tutorial is more flexible: you can use different long reads (e.g. ONT R9.4.1 or PacBio RSII) or raw long reads (so you can do the basecalling yourself) or even your own data from a different genome.
NCBI
In case the figshare links aren't working, the sample data is also available on NCBI:
- Reference assembly
- Illumina reads
- ONT R10.4 reads: raw, basecalled
- ONT R9.4.1 reads: raw, basecalled
- PacBio RSII reads: raw, basecalled