AB Rename IDs - mendessoares/BuddySuite GitHub Wiki
--rename_ids, -ri
Description
Modify record identifiers by searching for simple strings or more complex regular expressions. Each match will be replaced with your substitution string within the ID.
Arguments
Query ( regex )
The query is a regular expression that searches inside every ID for any sub-string matches. Only the part that is matched will be replaced, not the entire ID. If you would like to match the entire ID, prefix the search with ^
and suffix with $
; these are the 'start of string' and 'end of string' identifiers, respectively (see example 3).
Substitution ( str )
All matches to the query will be replaced with this exact string. If you want to retain part of the query in the substitution, you can do so by enclosing the proper part of the query in parentheses () and then using a back slash followed by a number (e.g., \1). Use '\1' for the first set of parentheses, '\2' for the second, etc (see example 4).
Max replacements ( int )
Optional. If a pattern is present in the IDs more than once but only some of those matches should be replaced, set a maximum number of replacements (see examples 5). The default is '0', which corresponds to 'all'. To match/replace from right-to-left, instead of left-to-right, provide a negative number (see examples 6).
Examples
Input file: C-terms.physr
4 50
Dme-Panxδ1 ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Mle-Panxα1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 1
Simple replacement
$: alb C-terms.physr -ri 'Mle' 'Mnemiopsis'
Output
4 50
Dme-Panxδ1 ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Mnemiopsis-Panxα1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mnemiopsis-Panxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mnemiopsis-Panxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mnemiopsis-Panxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 2
Incorporate a regular expression
$: alb C-terms.physr -ri 'Panx[αδ]1' 'Panx?'
Output
4 50
Dme-Panx? ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panx?1 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Mle-Panx? MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 3
Match an ID exactly
$: alb C-terms.physr -ri '^Dme-Panxδ1$' 'Unknown_Panx'
Output
4 50
Unknown_Panx ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Mle-Panxα1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 4
Keep part of the match in the replacement
$: alb C-terms.physr -ri '^(..)e-Panx([αδ][0-9]+)$' '\1-Inx\2'
Output
4 50
Dm-Inxδ1 ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dm-Inxδ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dm-Inxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dm-Inxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Ml-Inxα1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Ml-Inxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Ml-Inxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Ml-Inxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 5
Limit the number of matches
$: alb C-terms.physr -ri '[a-z]' '?' 2
Output
4 50
D??-Panxδ1 ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
D??-Panxδ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
D??-Panxδ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
D??-Panxδ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
M??-Panxα1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
M??-Panxα5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
M??-Panxα6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
M??-Panxα9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY
Usage example 6
Match from right-to-left
$: alb C-terms.physr -ri '[a-z]' '?' -2
Output
4 50
Dme-Pa??δ1 ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Pa??δ11 ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Pa??δ3 ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Pa??δ4 ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY
4 50
Mle-Pa??α1 MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Pa??α5 --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Pa??α6 --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Pa??α9 ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY