AB Rename IDs - mendessoares/BuddySuite GitHub Wiki

--rename_ids, -ri

Description

Modify record identifiers by searching for simple strings or more complex regular expressions. Each match will be replaced with your substitution string within the ID.

Arguments

Query ( regex )

The query is a regular expression that searches inside every ID for any sub-string matches. Only the part that is matched will be replaced, not the entire ID. If you would like to match the entire ID, prefix the search with ^ and suffix with $; these are the 'start of string' and 'end of string' identifiers, respectively (see example 3).

Substitution ( str )

All matches to the query will be replaced with this exact string. If you want to retain part of the query in the substitution, you can do so by enclosing the proper part of the query in parentheses () and then using a back slash followed by a number (e.g., \1). Use '\1' for the first set of parentheses, '\2' for the second, etc (see example 4).

Max replacements ( int )

Optional. If a pattern is present in the IDs more than once but only some of those matches should be replaced, set a maximum number of replacements (see examples 5). The default is '0', which corresponds to 'all'. To match/replace from right-to-left, instead of left-to-right, provide a negative number (see examples 6).

Examples

Input file: C-terms.physr

 4 50
Dme-Panxδ1   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3   ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4   ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Mle-Panxα1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 1

Simple replacement

$: alb C-terms.physr -ri 'Mle' 'Mnemiopsis'

Output

 4 50
Dme-Panxδ1   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3   ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4   ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Mnemiopsis-Panxα1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mnemiopsis-Panxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mnemiopsis-Panxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mnemiopsis-Panxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 2

Incorporate a regular expression

$: alb C-terms.physr -ri 'Panx[αδ]1' 'Panx?'

Output

 4 50
Dme-Panx?   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panx?1  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3  ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4  ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Mle-Panx?   MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 3

Match an ID exactly

$: alb C-terms.physr -ri '^Dme-Panxδ1$' 'Unknown_Panx'

Output

 4 50
Unknown_Panx  ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Panxδ11   ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Panxδ3    ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Panxδ4    ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Mle-Panxα1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Panxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Panxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Panxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 4

Keep part of the match in the replacement

$: alb C-terms.physr -ri '^(..)e-Panx([αδ][0-9]+)$' '\1-Inx\2'

Output

 4 50
Dm-Inxδ1   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dm-Inxδ11  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dm-Inxδ3   ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dm-Inxδ4   ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Ml-Inxα1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Ml-Inxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Ml-Inxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Ml-Inxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 5

Limit the number of matches

$: alb C-terms.physr -ri '[a-z]' '?' 2

Output

 4 50
D??-Panxδ1   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
D??-Panxδ11  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
D??-Panxδ3   ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
D??-Panxδ4   ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
M??-Panxα1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
M??-Panxα5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
M??-Panxα6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
M??-Panxα9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY

Usage example 6

Match from right-to-left

$: alb C-terms.physr -ri '[a-z]' '?' -2

Output

 4 50
Dme-Pa??δ1   ----YKLLGSLKSYLKWQ-IQTDNAVFRLHNSFTTVLLLTCSLIITATQY
Dme-Pa??δ11  ----MDVFGSVKGLLKID--QVDNNVFRMHYKATVIILIAFSLLVTSRQY
Dme-Pa??δ3   ------------GFIKID-----NMVFRCHYRIT-AILFTCCIIVTANNL
Dme-Pa??δ4   ----MAAVKPLSKYLQFK-VHIYDAIFTLHSKVTVALLLACTFLLSSKQY

 4 50
Mle-Pa??α1  MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Mle-Pa??α5  --MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGII
Mle-Pa??α6  --MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQY
Mle-Pa??α9  ---MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQY