Benchmarks: File Sizes - simonrharris/SKA GitHub Wiki

File sizes

Creating split kmer files from fasta references (ska fasta)

Species Accession Number Number of Contigs Genome Length File size
S. aureus HE681097 1 2,832,299 30Mb
C. jejuni GCA_001879185.1 16 1,629,708 17Mb
E. coli GCA_000703365.1 7 5,412,686 54Mb
L. monocytogenes GCA_001257675.1 39 2,905,183 31Mb
S. enterica GCA_000439415.1 2 4,808,805 50Mb

Creating split kmer files from fastq files (ska fastq)

Species Number of Paired Files Mean # Reads Mean # Bases Maximum File Size Mean File Size
S. aureus 65 1,608,999 223,069,930 30Mb 30Mb
C. jejuni 22 3,361,254 657,706,314 20Mb 18Mb
E. coli 9 2,122,840 356,661,903 55Mb 54Mb
L. monocytogenes 31 1,777,006 380,854,631 31Mb 31Mb
S. enterica 23 1,859,063 278,574,418 51Mb 49Mb

Merging split kmer files (ska merge)

Species # Files Mean # kmers Merged File Size
S. aureus 65 2,761,345 77Mb
S. aureus outbreak 45 2,761,896 30Mb
C. jejuni 22 1,680,619 29Mb
E. coli 9 5,116,311 56Mb
L. monocytogenes 31 2,922,836 86Mb
S. enterica 23 4,660,457 53Mb