Serratus Lite - ababaian/serratus GitHub Wiki
Serratus-lite
is the isolated virus-discovery workflow if you are interested in taking a look at your own data. It will create a Serratus standard .pro
/.bam
aligned read file, and a virus-summary file (.summary
).
While Serratus is a platform for ultra-high throughput sequence analysis and one application is for virus discovery. It is not the most efficient means of identifying viruses (known or novel) from a dataset, but it is fast.
RNA Virus RdRP Search (translated nucleotide)
# Create a local diamond database file for `rdrp1`
diamond makedb --in rdrp1.fa -d rdrp1
# Run DIAMOND
FQ1='<YOUR FASTQ_1 FILE>'
FQ2='<YOUR_FASTQ_2 FILE (if paired, else blank)>'
# Run diamond
diamond blastx \
-q $FQ1 $FQ2 \
-d rdrp1.dmnd \
--masking 0 \
--sensitive -s 1 \
-c1 -p1 -k1 -b 0.75 \
-f 6 qseqid qstart qend qlen qstrand \
sseqid sstart send slen \
pident evalue cigar \
qseq_translated full_qseq full_qseq_mate \
> $FQ1.pro
# Run summarizer
SUMZER_SRA=$FQ1
SUMZER_MAXALNS=1000000
SUMZER_MAXX=100
SUMZER_THROWX="NO"
cat $FQ1.pro \
| python2 serratus_psummarizer.py $FQ1.psummary /dev/null
Vertebrate RefSeq virus search (nucleotide)
# For unpaired data
# use -1 and -2 flags in bowtie2
# for paired data
FQ1='<YOUR FASTQ_1 FILE>'
# bowtie2 build Index:
bowtie2-build cov3ma.fa cov3ma
# run bowtie2
bowtie2 --quiet --very-sensitive-local \
--rg-id na --rg LB:na --rg SM:na \
--rg PL:na --rg PU:na \
-x cov3ma -U $FQ1 | \
samtools view -b -F 4 - > $FQ1.bam
# run Summarizer
SUMZER_COMMENT=$(echo sra="na",genome="cov3ma",version=200818,date=$(date +%y%m%d-%R))
summarizer="python3 serratus_summarizer.py /dev/stdin cov3ma.sumzer.tsv $FQ1.summary /dev/null"
# Summarize v2
samtools view $FQ1.bam | \
$summarizer