ska unique - simonrharris/SKA GitHub Wiki
The unique subcommand allows the identification of kmers that are unique to a particular set of split kmer files (the ingroup). This is done by providing a sample file containing the names of the ingroup samples using the -i flag. The output split kmer file will include all kmers that were present in a minimum proportion (defined with the -p flag) of the ingroup samples, but not found in any other samples in the input split kmer files.
If this method is used to define kmers unique to a set of outbreak isolates, the output split kmer file, containing the unique kmers associated with the outbreak, can be used as the input query file in ska compare to rapidly assess whether new samples are members of the ingroup.
The output split kmers can be annotated against an assembly of one of the ingroup samples using ska annotate to identify the genomic regions that are unique to the ingroup.
The method could also be used to identify uniquely-shared genomic regions in otherwise unrelated genomes, for example to quickly spot shared accessory genome.
ska unique [options]
Options:
-f <file> File of split kmer file names. These will be added to or
used as an alternative input to the list provided on the
command line.
-h Print this help.
-i <file> File of ingroup sample names. Unique kmers found
in these files will be retained.
-n Allow Ns as in unique split kmers.
-o <file> Output file prefix. [Default = unique]
-p <float> Minimum proportion of ingroup isolates required to possess a
split kmer for that kmer to be retained. [Default = 0.9]