Viral contigs containing RdRP - ababaian/serratus GitHub Wiki

Download

https://serratus-public.s3.amazonaws.com/rdrp_contigs/rdrp_contigs.tar.gz (1Gb tarball)

Tarball contains two files:
rdrp_contigs.fa (2.9 Gb FASTA) Serratus contigs with palmprint detected by palmscan
rdrp_contigs.tsv (265Mb tab-separated text)

Classification as viral and known/novel

A contig is classified as viral if (1) it has a high-confidence RdRP according to palmscan, and (2) it has an E-value <= 1e-6 in a diamond search of the named viral subset of PalmDB. Otherwise, it is undetermined (undet).

A contig is classified as known if its palmprint has >= 90% identity in a diamond search of the NCBI non-redundant protein database NR, otherwise it is novel.

Contig counts by category

1016347 viral/novel
326942 viral/known
96359 undet/novel
6197 undet/known

Tentative taxonomy assignment

Tentative taxonomies were predicted by a simple consensus method. The usearch_global command in usearch was used to search the named viral (NV) subset of PalmDB release 2021-03-14 named.fa.gz. The top 10 hits were considered for each palmprint, and the majority name assigned at each rank. If there was no majority, no name is assigned. Identity thresholds were applied: phylum=0%, class=30%, order=30%, family=40%. genus=70%, species=90%. If a hit had identity less than the threshold, the name at that rank is excluded.

Fields in rdrp_contigs.tsv

1 Contig FASTA label of contig
2 SRA SRA accession
3 Length Contig length
4 Depth Mean coverage (read depth)
5 Category One of viral/novel, viral/known, undet/novel, undet/known
6 NR_label Label of top hit to non-redundant protein (NR).
7 NR_pctid Identity of top hit in NR.
8 NR_evalue E-value of top hit in NR.
9 NV_label Label of top hit to named viral (NV).
10 NV_pctid Identity of top hit in NV.
11 NV_evalue E-value of top hit in NV.
12 PalmDB_label Label of top hit to PalmDB species-like OTU.
13 PalmDB_pctid Identity to PalmDB sOTU.
14 phylum Tentative phylum
15 class Tentative class
16 order Tentative order
17 family Tentative family
18 genus Tentative genus
19 species Tentative species