NGS PrimerPlex installation in linux as a standalone tool - aakechin/NGS-PrimerPlex GitHub Wiki

This is a detailed description of the NGS-PrimerPlex installation without GUI and docker image. First, you need to download NGS-PrimerPlex github files.

image

Save it to a desired folder and unzip. You will get the folder NGS-PrimerPlex-master. Next steps you will need to do in the command-line (terminal in linux).

Terminal

Run script of automatic installation (install_for_linux.sh). It requires admin right, so run it with sudo:

installation1

This script will install all necessary dependencies. Now you need to get reference genome (if you still don't have). It can be the reference genome of any organism. Ideally, it should be the reference genome with good annotation, e.g. for E. coli there are many genomes available, but not all of them have good annotation. However, annotation will influence only the process of automatic target region extraction. Other procedures can be carried out with any reference genome or even with any FASTA-file that you would like to use as reference (e.g. gene sequences). To prepare human reference genome download it in your browser:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit

Then download tool twoBitToFa, that will convert 2bit file to FASTA-file:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

And start the conversion (open in terminal folder, where the reference and the tool are located), after which index it with BWA:

twoBitToFa hg19.2bit ucsc.hg19.fa
bwa index ucsc.hg19.fasta

Both processes take much time. If you want to automatically extract target genome regions, you will have to also download GenBank-files for each of chromosome for genome version that you are going to use, e.g. from NCBI Genome database. Each GenBank-file should be named as this chromosome is called in the reference genome FASTA-file or as it is ordered in the reference FASTA-file. For example, for the above hg19 version chromosome 1 GenBank-file can be named as chr1.gb or 2.gb (because in the reference genome chrM is written as the 1st chromosome and chr1 as the 2nd). For example, hg19 reference FASTA-file has the following names of chromosomes (you can look at it with less ucsc.hg19.fasta):

hg19_less

Other chromosomes have the same format (chrM, chr1, chr2 etc.). So, open NCBI Genome and search for human:

image

Go to bottom of the page, there will be all files for each chromosome:

image

Open it one by one, and save the file as GenBank (full):

image

Call it chr1.gb. Repeat it for other chromosomes, naming them as they called in the ucsc.hg19.fasta file.

To check primers for crossing SNPs, download also VCF-file (bgzipped) with variation in the reference genome:

wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz
wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz.tbi

Now we have all necessary files and are ready to run the test script:

python3 /NGS-PrimerPlex/test.py

All of the tests should be completed successfully. If you met any errors, report about it in the Issues at the GitHub here, please.

Now, you will be able to start example primer design or your own list of genes:

python3 getGeneRegions.py -glf example_gene_list_file.txt -ref hg19/ -rf example_gene_list_file.regions.csv 
python3 NGS_primerplex.py -regions example_gene_list_file.regions.csv -ref hg19/ucsc.hg19.fasta -blast -snps -dbsnp hg19/common_all_20180423_hg19.vcf.gz

This will give you primers that could be designed with the default parameters. The default parameters are defined in such a way that a user can surely obtain designed primers for the example. For a subsequent use of the program, we recommend to use more stringent parameters. Then, you can use generated file with draft primers as -draft argument and defining less strict parameters for the primer design.