SB Find repeats - mendessoares/BuddySuite GitHub Wiki
--find_repeats, -frp
Description
Search through all sequences and return lists of sequences with duplicate IDs and/or identical sequences.
Argument
Columns ( int )
Optional. Specify the number of columns that IDs should be organized into.
Examples
Input file: Mle-Panx-C_terms.nex
#NEXUS
begin data;
dimensions ntax=16 nchar=50;
format datatype=protein missing=? gap=-;
matrix
'Mle-Panxα12' -m--vidilsgf------------kgitpfkgitlddgwdqinrsfmfvl
'Mle-Panxα9' ----mldilskf------------kgvtpfkgitiddgwdqlnrsfmfvl
'Mle-Panxα10B' -m--rlsekstshdckacitrshnedcarrwgitiddgwdqlnrsfmfgl
'Mle-Panxα7A' -m--gveilfpi----------inratapiksvniddlssqlnrtfmfyl
'Mle-Panxα8' -m--vlevlalf------------prlapfkvitlddvwdqwnrsfmfim
'Mle-Panxα1' -mywifeicqei------------kraqscrkfaidgpfdwtnriimptl
'Mle-Panxα9' ----mldilskf------------kgvtpfkgitiddgwdqlnrsfmfvl
'Mle-Panxα2' -m--vldlisgs----------l-ngflkiksvsiddqwdqinrtylvmf
'Mle-Panxα5' -m--iywvwavf------------krmapfkvvtlddrwdqmnrsfmmpl
'Mle-Panxα4' -m--viellagy------------kglspfkdatvddswdqinrcyvfia
'Mle-Panxα3' ml--llgslgti------------knlsifkdlslddwldqmnrtfmfll
'Mle-Panxα6' -m--lleilanf------------kgatpfkeivlddkwdqinrcymfll
'Mle-Panxα8' ----mldilskf------------kgvtpfkgitiddgwdqlnrsfmfvl
'Mle-Panxα11' -m--lisslvqf------------srlspfkeitiddgwdqlnrsfmfvl
'Mle-Panxα10A' -m--rlsekstshdckacitrshnedcarrwgitiddgwdqlnrsfmfgl
'Mle-Panxα6' ----mldilskf------------kgvtpfkgitiddgwdqlnrsfmfvl
;
end;
Usage example 1
$: sb Mle-Panx-C_terms.nex -drp
Output
#### Records with duplicate IDs: ####
Mle-Panxα9
Mle-Panxα8
Mle-Panxα6
#### Records with duplicate sequences: ####
[Mle-Panxα10A, Mle-Panxα10B]
[Mle-Panxα9, Mle-Panxα9, Mle-Panxα8, Mle-Panxα6]
Usage example 2
$: sb Mle-Panx-C_terms.nex -drp 2
Output
#### Records with duplicate IDs: ####
Mle-Panxα9 Mle-Panxα8
Mle-Panxα6
#### Records with duplicate sequences: ####
[Mle-Panxα10A, Mle-Panxα10B], [Mle-Panxα9, Mle-Panxα9, Mle-Panxα8, Mle-Panxα6]