FLOM reference - ababaian/serratus GitHub Wiki
FLOM = Full Length Only Mega-genome.
Full-length = annotated as "complete genome" in Genbank.
Mega-genome = genomes from multiple virus families, unless otherwise stated all families known to infect vertebrates.
As opposed to "pan-genome" = genomes from one family, Coronaviridae unless otherwise stated.
FLOM1
has two subsets:
-
CoV (densely covered) The Cov subset is all Cov complete genomes clustered at 99% identity. These were extracted from the
cov2m
reference. -
non-CoV (full-length representative sequences only).
The non-CoV subset is "representative" genomes for all families known to infect vertebrates except Cov.
NCBI divides virus genomes into "representatives" and "neighbors": NCBI virus genome browser.
Search Term:
Viruses[Organism] AND srcdb_refseq[PROP] NOT wgs[PROP] NOT cellular organisms[ORGN] NOT AC_000001:AC_999999[PACC] AND ("vhost human"[Filter] AND "vhost vertebrates"[Filter])
Accessed: 2020 05 17
. 510 Entries retrieved. ~60 CoV entries removed.