3. Classification - scubalaina/bathymodiolusvirome GitHub Wiki
As far as we know, viruses lack a universal marker gene. Efforts to classify viruses typically involves examining how similar a whole viral genome is to that of viruses described in databases.
Tool: vContact (Bolduc et al. 2017): A CyVerse Discovery Environment application which classifies viruses based on a similarity network that clusters viral sequences together based on based on the presence or absence of gene clusters.
I'll admit, it's an intense workflow, but you can classify your viral sequences down to the genus level.
IMPORTANT: You must provide a reference dataset that your viral sequences are compared to. I used ViralRefSeq.
ALSO IMPORTANT You must translate the reference dataset and your viral sequences into amino acid sequences! I used Prodigal to translate the sequences and predict the open reading frames.
Once you have the .faa of your viral sequences and the reference viral sequences, you can follow the instructions below.
Data preparation instructions are here
Applying vContact insturctions are here