SKA annotate

The annotate subcommand locates split kmers in a reference genome sequence and annotates them into a vcf (v4.3) format output file.

If the input format is a gff file, split kmers matching CDS, tRNA or rRNA features will be annotated in the info column of the vcf with the following information where available.

Feature ID
Feature type (CDS, tRNA or rRNA)
Strand
Position of base in feature

For CDS features the following will also be included where available

Locus tag
Systematic ID
Gene name
Position of amino acid in feature
Position of base in codon
Reference amino acid
Alternate amino acids (comma separated list matching the alt bases in the 5th column of the vcf file)
Product (only output when the -p flag is used)

By default split kmers that are repetitive in the reference sequence will not be annotated. To annotate them, use the -i flag. Bases annotated on the basis of repetitive split kmers will be labelled with the RR (repeat region) flag in the info column.

By default all split kmers that map to the reference sequence are annotated (except repeats, see above). To only annotate split kmers with a middle base that differs from the reference sequence, use the -v flag.

Usage

ska annotate [options] <kmer files>

Options:
-h		Print this help.
-f <file>	File of split kmer file names. These will be added to or 
		used as an alternative input to the list provided on the 
		command line.
-i		Include kmers in repetitive reference regions.
-o <file>	Prefix for output files. [Default = found]
-p		Include product in output.
-r <file>	Reference fasta/gff file name. [Required]
-v		Only output variant sites.

ska annotate - simonrharris/SKA GitHub Wiki

SKA annotate

Usage

Citation

⚠️ GitHub.com Fallback ⚠️

ska annotate - simonrharris/SKA GitHub Wiki

SKA annotate

Usage

Citation

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️