SB Pull records - mendessoares/BuddySuite GitHub Wiki
--pull_records, -pr
Description
Return all sequences with IDs containing a regular expression pattern match. The search will also look in the 'description' fields if you specify the 'full' keyword.
Argument
One or more search strings ( regex )
As many simple strings or regular expressions as you want. To avoid issues with special characters, make a habit of adding 'single quotes' around the search terms.
'full' ( exact string )
Optional. By default, only the record IDs are searched. If the records have a description field, then you can pass in the word 'full' to expand the search to this metadata. In the rare case that you must search for the exact word 'full' in your IDs, turn it into an explicit regular expression by enclosing it in parentheses --> '(full)'
File path ( path ) (Available in V 1.3)
Optional. If searching for many different records, it can be easier to put the search terms in a separate file. Put each term on it's own line, but remember that SeqBuddy is searching for regular expressions! If you are looking for exact ID matches, it is good practice to include the '^' and '$' operator on each term (e.g., ^id_1234$).
Examples
Input file: Panx-ends.fa
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Panxα1 cDNA - ML078817.
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Panxα5 cDNA - ML223536a.
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6 cDNA - ML25993a.
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9 cDNA - ML47742a.
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Usage example 1
$: sb Panx-ends.fa -pr 'Dme'
Output
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
Usage example 2
$: sb Panx-ends.fa -pr '.*Panx[αδ][1-2]'
Output
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Mle-Panxα1 cDNA - ML078817.
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
Usage example 3
$: sb Panx-ends.fa -pr 'δ1' 'α5'
Output
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Mle-Panxα5 cDNA - ML223536a.
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
Usage example 4
Include the description metadata in the search with the 'full'
keyword
$: sb Panx-ends.fa -pr 'δ1' 'ML[0-9]*a' 'full'
Output
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Mle-Panxα5 cDNA - ML223536a.
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6 cDNA - ML25993a.
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9 cDNA - ML47742a.
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Usage example 5
Read from a file of search terms
Search terms file: names.txt
^Dme-Panxδ1$
Dme-Panxδ[59]
$: sb Panx-ends.fa -pr names.txt
Output
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Mle-Panxα5 cDNA - ML223536a.
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα9 cDNA - ML47742a.
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS