ska summary - simonrharris/SKA GitHub Wiki

SKA summary

The summary subcommand prints some summary statistics for a set of split kmer files.

Output columns

Column Description
Sample The name of the sample being summarised
Kmer size The split kmer size used to create the file
Total kmers Total number of split kmers in the file
As Number of split kmers with an A as the middle base
Cs Number of split kmers with an C as the middle base
Gs Number of split kmers with an G as the middle base
Ts Number of split kmers with an T as the middle base
Ns Number of split kmers with an N as the middle base
Others Number of split kmers with any other letter as the middle base
GC Content The GC content of the middle base of all split kmers

Using SKA summary to QC split kmer files

The summary subcommand is useful for QC purposes. You would expect the number of split kmers in each kmer file to be approximately the length of the genome or slightly higher. If the number of split kmers is much lower than the expected genome size, then the sequence data may not be of high enough quality or at high enough depth for the default settings if ska fastq was used to produce the split kmer file, or your assembly may be incomplete it ska fasta was used. If the number of split kmers is much larger than the expected then you may have contamination in your sequencing data, or your data may be of low quality. Similarly, you would expect the GC content of the middle base of the split kmers to be representative of the species being sequenced.

Usage

ska summary [options] <split kmer files>

Options:
-f <file>	File of split kmer file names. These will be added to or 
		used as an alternative input to the list provided on the 
		command line.
-h		Print this help
⚠️ **GitHub.com Fallback** ⚠️