ska annotate - simonrharris/SKA GitHub Wiki
The annotate subcommand locates split kmers in a reference genome sequence and annotates them into a vcf (v4.3) format output file.
If the input format is a gff file, split kmers matching CDS, tRNA or rRNA features will be annotated in the info column of the vcf with the following information where available.
- Feature ID
- Feature type (CDS, tRNA or rRNA)
- Strand
- Position of base in feature
For CDS features the following will also be included where available
- Locus tag
- Systematic ID
- Gene name
- Position of amino acid in feature
- Position of base in codon
- Reference amino acid
- Alternate amino acids (comma separated list matching the alt bases in the 5th column of the vcf file)
- Product (only output when the -p flag is used)
By default split kmers that are repetitive in the reference sequence will not be annotated. To annotate them, use the -i flag. Bases annotated on the basis of repetitive split kmers will be labelled with the RR (repeat region) flag in the info column.
By default all split kmers that map to the reference sequence are annotated (except repeats, see above). To only annotate split kmers with a middle base that differs from the reference sequence, use the -v flag.
ska annotate [options] <kmer files>
Options:
-h Print this help.
-f <file> File of split kmer file names. These will be added to or
used as an alternative input to the list provided on the
command line.
-i Include kmers in repetitive reference regions.
-o <file> Prefix for output files. [Default = found]
-p Include product in output.
-r <file> Reference fasta/gff file name. [Required]
-v Only output variant sites.