De novo Assembly - Golob-Minot/geneshot GitHub Wiki

To identify microbial genes, geneshot will:

  • perform de novo assembly of short reads with MEGAHIT,
  • identify protein-coding sequences with Prodigal, and
  • deduplicate similar gene sequences using MMseqs2

Various flags include:

  • --phred_offset: The PHRED offset used by MEGAHIT, default: 33
  • --min_identity: Amino acid identity cutoff used by MMseqs2 to combine similar genes, default: 90
  • --min_coverage: Length cutoff used by MMseqs2 to combine similar genes, default: 50