Serratus Lite - ababaian/serratus GitHub Wiki

Serratus-lite is the isolated virus-discovery workflow if you are interested in taking a look at your own data. It will create a Serratus standard .pro/.bam aligned read file, and a virus-summary file (.summary).

While Serratus is a platform for ultra-high throughput sequence analysis and one application is for virus discovery. It is not the most efficient means of identifying viruses (known or novel) from a dataset, but it is fast.

RNA Virus RdRP Search (translated nucleotide)

# Create a local diamond database file for `rdrp1`
diamond makedb --in rdrp1.fa -d rdrp1

# Run DIAMOND
FQ1='<YOUR FASTQ_1 FILE>'
FQ2='<YOUR_FASTQ_2 FILE (if paired, else blank)>'

# Run diamond
diamond blastx \
  -q $FQ1 $FQ2 \
  -d rdrp1.dmnd \
  --masking 0 \
  --sensitive -s 1 \
  -c1 -p1 -k1 -b 0.75 \
  -f 6 qseqid  qstart qend qlen qstrand \
       sseqid  sstart send slen \
       pident evalue cigar \
       qseq_translated full_qseq full_qseq_mate \
  > $FQ1.pro

# Run summarizer
SUMZER_SRA=$FQ1
SUMZER_MAXALNS=1000000
SUMZER_MAXX=100
SUMZER_THROWX="NO"

cat $FQ1.pro \
  | python2 serratus_psummarizer.py $FQ1.psummary /dev/null

Vertebrate RefSeq virus search (nucleotide)

# For unpaired data
# use -1 and -2 flags in bowtie2
# for paired data
FQ1='<YOUR FASTQ_1 FILE>'

# bowtie2 build Index:
bowtie2-build cov3ma.fa cov3ma

# run bowtie2
  bowtie2 --quiet --very-sensitive-local \
    --rg-id na --rg LB:na --rg SM:na \
    --rg PL:na --rg PU:na \
    -x cov3ma -U $FQ1 | \
    samtools view -b -F 4 - > $FQ1.bam


# run Summarizer
SUMZER_COMMENT=$(echo sra="na",genome="cov3ma",version=200818,date=$(date +%y%m%d-%R))
summarizer="python3 serratus_summarizer.py /dev/stdin cov3ma.sumzer.tsv $FQ1.summary /dev/null"

# Summarize v2
samtools view $FQ1.bam | \
$summarizer