Camaroon gut - ababaian/serratus GitHub Wiki
Camaroon Gut Virome - SRR7892437
Sub-assembly Virome
Standard viral discovery analysis (including within the Serratus project) heavily relies on assembly for the identification of known and unknown viruses. An important limitation of this approach is that it will fail to identity rare viruses within a sample. That is, cases where read-coverage of the genome or marker gene is not sufficient for assembly to succeed (~1-2x). This does not preclude viral identification, but it makes it challenging to identify the virus above noise.
Take Home: Assembly will always be less sensitive than alignment for detecting viruses with high identity to a reference sequence (+90% nucleotide)
In the library SRR7892437 - Gut virome of Cameroonians living in close proximity with bats, we see an example of RNA from fruit bat coronaviruses appearing in a human gut. No conclusion can be made if this is a zoonotic infection, but it does confirm that a human-coronavirus interface event has occured.
There are 81 CoV reads, 66 of which aligned to MG693168.1 - Bat coronavirus isolate CMR704-P12. As of yet there is not a robust method to distinguish a case such as this from background noise (from experience, 88 CoV reads spread across multiple bins is often a good indicator it is not a false-positive).
The .summary
for this virus is given below, you can see Coronaviridae has a low score=28
, thus it is unreasonable to expect the assembly of this genome (at 0.4x coverage) to succeed.
# SRR7892437.summary file
famcvg=_u__u.:w:u__:u:__uuu:::ww
fam=Coronaviridae
topname=Bat coronavirus isolate CMR900 ORF1a, ORF1b, Spike protein, ORF3, Membrane protein, capsid, hypothetical protein ORFx, and hypothetical protein ORFy genes, complete cds
score=28
pctid=98
depth=0.4
aln=81
glb=61
len=30000
top=MG693169.1
topscore=4
toplen=113
Inspecting the alignment
Opening the .bam
file in IGV shows this quite clearly, there majority of the genome is missing. If you BLASTn the reads they map with 100% identity to MG693172.1
, a CoV isolated from Eidolon helvum feces in the publication Cameroonian fruit bats harbor divergent viruses, including rotavirus H, bastroviruses, and picobirnaviruses using an alternative genetic code. Yinda et al.,2018.
Read name = SRR7892437.3504150
Sample = SAMN10080885
Library = SRR7892437
Read length = 150bp
Mapping = Primary @ MAPQ 11
Cigar = 150M
Mate is mapped = yes
Mate start = MG693168.1:9392 (-)
Insert size = 207
Read sequence = TCTACAGGTGGTGATGTTGTTTATCAACCGCCACGCTG
TAGTGTGACTGCAGCTTCTCTACAGGGAGGTTTAGCTAAGATGGCTCATCCATC
TGGTCCTGTAGAGAAATGTGTTGTTAAAGTGACTTATGGCACTATGACACTTAA
CGGT
Interperting the data in Serratus is dependent on the question you are asking, assembly is robust when it works but if you're interested in studying virus-host interactions, it often may be too high a bar to set.