External Gene Catalog - Golob-Minot/geneshot GitHub Wiki
If a user wishes to analyze a dataset using a set of gene sequences which have been
generated by some other method, or from some other dataset, they can do so using the
--gene_fasta
flag.
The file indicated by this flag must be gzip-compressed and in FASTA format, with each record in the FASTA being a unique amino acid (protein) sequence.
Even with the external gene catalog specified in this way, de novo assembly will
still be carried out by geneshot
. The reason for that behavior is that the co-assembly
of genes on the same contig is used as information to speed up and optimize the
CAG-creation process. While this process is computationally slow, the quality of those
results is a large part of the utility of the geneshot
pipeline as a whole.