FLOM reference - ababaian/serratus GitHub Wiki

FLOM = Full Length Only Mega-genome.

Full-length = annotated as "complete genome" in Genbank.

Mega-genome = genomes from multiple virus families, unless otherwise stated all families known to infect vertebrates.

As opposed to "pan-genome" = genomes from one family, Coronaviridae unless otherwise stated.

FLOM1 has two subsets:

  1. CoV (densely covered) The Cov subset is all Cov complete genomes clustered at 99% identity. These were extracted from the cov2m reference.

  2. non-CoV (full-length representative sequences only).

The non-CoV subset is "representative" genomes for all families known to infect vertebrates except Cov.

NCBI divides virus genomes into "representatives" and "neighbors": NCBI virus genome browser.


Search Term: Viruses[Organism] AND srcdb_refseq[PROP] NOT wgs[PROP] NOT cellular organisms[ORGN] NOT AC_000001:AC_999999[PACC] AND ("vhost human"[Filter] AND "vhost vertebrates"[Filter])

Accessed: 2020 05 17. 510 Entries retrieved. ~60 CoV entries removed.