Home - mariehoffmann/isPCR GitHub Wiki

Welcome to isPCR 🍂

isPCR is an in silico PCR tool, that allows to explore the efficiency of a primer pair on an arbitrary taxonomic branch. You can either explore the results visually or create a dump file listing for each lineages below the user-given root node coverage and resolution statistics w.r.t. species and genus levels.

            \`.__..--'' `.
            ( _          ,\
           ( <_< < <   `','`.
            \ (_< < <    \   `.
             `. `----'   (  q _p
               `-._  _.-' `-(_''\
                (_'))--,      `._\
                   `-._<

Usage

  • isPCR provides some common primer sequences. To see a list and run an in silico PCR with one of them, type:
python isPCR.py --list_primers
python isPCR.py --taxid 12345 --primer DIV4
  • or define your own sequences:
python isPCR.py --taxid 12345 --primer_fwd <sequence> --primer_rev <sequence>

Getting Started

In order to run the in silico PCR tool on the latest reference dataset provided by e.g. NCBI on their downlad page, you have to setup two databases: one that allows efficient parsing in the taxonomic tree, and one that allows search for sequence matches via BLAST. From hereon I assume, you use PostgreSQL, the taxonomy files and the nt dataset provided by NCBI, but you can use any reference sequence file following the same format, i.e. fasta format and NCBI nomenclatura for accession numbers. In case you use the setup.sh script, the following directories will be created:

  • ~/isPCR/tmp for temporary files
  • ~/isPCR/taxDB for the taxonomy database
  • ~/isPCR/blastDB for the reference database

Prerequisites

  • Python 3.2 to 3.7 or Python Anaconda
  • psycopg2 - a PostgreSQL adapter for Python
    • install with pip when using Python and conda when using the Anaconda distribution
  • PostgreSQL 7.4 to 11
  • Free space depending on the size of your reference data set. The complete nt(.fasta) data set is about 210 GB large.

Step 1: Clone isPCR

In your git directory:

cd ~/git
git clone https://github.com/mariehoffmann/isPCR.git
cd isPCR

Step 2: Build Taxonomy Database

Launch the setup routine which will

  • create a directory for the taxonomy database (~/isPCR/taxDB)
  • downloads the latest taxonomy with md5 hash check
  • builds the taxonomy database

The same proceeding applies to updates of the taxonomy.

python setup.py --taxonomy

If you wish to separate download and building you can do

python setup.py --taxonomy --download
python setup.py --taxonomy --build

As shown in the taxonomy schema, the taxonomy database has four relations. Three of them are a subset of the tables provided by NCBI when downloading the taxonomy package, whereas Accessions contains a pre-computed list of available accessions given a taxonomic identifier (taxid). For details see Wiki Taxonomy.

Step 3: Build Reference Database

Launch the setup with the reference flag which will

  • create a directory for the reference database (~/isPCR/refDB)
  • downloads the latest nt data set from NCBI with md5 hash check
  • builds the reference database with indexed by accession numbers
python setup.py --reference

If you wish to separate download and building you can do

python setup.py --reference --download
python setup.py --reference --build

Make sure that you have write permissions for the download and build locations.

⚠️ **GitHub.com Fallback** ⚠️