Home - mariehoffmann/isPCR GitHub Wiki
isPCR is an in silico PCR tool, that allows to explore the efficiency of a primer pair on an arbitrary taxonomic branch. You can either explore the results visually or create a dump file listing for each lineages below the user-given root node coverage and resolution statistics w.r.t. species and genus levels.
\`.__..--'' `.
( _ ,\
( <_< < < `','`.
\ (_< < < \ `.
`. `----' ( q _p
`-._ _.-' `-(_''\
(_'))--, `._\
`-._<
- isPCR provides some common primer sequences. To see a list and run an in silico PCR with one of them, type:
python isPCR.py --list_primers
python isPCR.py --taxid 12345 --primer DIV4
- or define your own sequences:
python isPCR.py --taxid 12345 --primer_fwd <sequence> --primer_rev <sequence>
In order to run the in silico PCR tool on the latest reference dataset provided by e.g. NCBI on their downlad page, you have to setup two databases: one that allows efficient parsing in the taxonomic tree, and one that allows search for sequence matches via BLAST. From hereon I assume, you use PostgreSQL, the taxonomy files and the nt dataset provided by NCBI, but you can use any reference sequence file following the same format, i.e. fasta format and NCBI nomenclatura for accession numbers. In case you use the setup.sh
script, the following directories will be created:
-
~/isPCR/tmp
for temporary files -
~/isPCR/taxDB
for the taxonomy database -
~/isPCR/blastDB
for the reference database
- Python 3.2 to 3.7 or Python Anaconda
-
psycopg2 - a PostgreSQL adapter for Python
- install with
pip
when using Python andconda
when using the Anaconda distribution
- install with
- PostgreSQL 7.4 to 11
- Free space depending on the size of your reference data set. The complete
nt(.fasta)
data set is about 210 GB large.
In your git directory:
cd ~/git
git clone https://github.com/mariehoffmann/isPCR.git
cd isPCR
Launch the setup
routine which will
- create a directory for the taxonomy database (
~/isPCR/taxDB
) - downloads the latest taxonomy with md5 hash check
- builds the taxonomy database
The same proceeding applies to updates of the taxonomy.
python setup.py --taxonomy
If you wish to separate download and building you can do
python setup.py --taxonomy --download
python setup.py --taxonomy --build
As shown in the taxonomy schema, the taxonomy database has four relations. Three of them are a subset of the tables provided by NCBI when downloading the taxonomy package, whereas Accessions contains a pre-computed list of available accessions given a taxonomic identifier (taxid). For details see Wiki Taxonomy.
Launch the setup
with the reference
flag which will
- create a directory for the reference database (
~/isPCR/refDB
) - downloads the latest
nt
data set from NCBI with md5 hash check - builds the reference database with indexed by accession numbers
python setup.py --reference
If you wish to separate download and building you can do
python setup.py --reference --download
python setup.py --reference --build
Make sure that you have write permissions for the download and build locations.