RdRp trimming with hacked usearch v12 - ababaian/serratus GitHub Wiki
Binary is here: https://drive5.com/downloads/usearch12_trim
Usage
Command line is like this:
usearch -usearch_global extended_input.fa \
-id 0.01 \
-fulldp \
-maxaccepts 8 \
-maxrejects 32 \
-top_hit_only \
-db trimmed_reference.fa \
-userfields query+target+id+qtrimlo+qtrimhi \
-userout results.tsv \
-trimout trimmed_output.fa
FASTA output is written to -trimout
, tsv with coordinates is -userout
.
Tsv fields are 1. query label, 2 reference label of top hit, 3. %id of semi-global alignment, 4. one-based start coordinate of alignment in query, 5. one-based end coordinated of alignment in query.
CVI benchmark results
Result is considered correct (a true positive) if the overlap between the gold standard trim and tested trim is long enough. Results below are for minimum 50%, 75% and 90% overlap. There are four test+reference pairs made by CVI at 20%id, 50%id, 75%id and 90%id. N=number of test sequences, TP=number of trims with good overlap.
=== minpctov=50 ===
20%id N=141, TP=129, TP=91.5%
50%id N=517, TP=517, TP=100.0%
75%id N=389, TP=389, TP=100.0%
90%id N=138, TP=138, TP=100.0%
=== minpctov=75 ===
20%id N=141, TP=123, TP=87.2%
50%id N=517, TP=508, TP=98.3%
75%id N=389, TP=385, TP=99.0%
90%id N=138, TP=138, TP=100.0%
=== minpctov=90 ===
20%id N=141, TP=108, TP=76.6%
50%id N=517, TP=494, TP=95.6%
75%id N=389, TP=383, TP=98.5%
90%id N=138, TP=138, TP=100.0%