NGS PrimerPlex installation in linux as a standalone tool - aakechin/NGS-PrimerPlex GitHub Wiki
This is a detailed description of the NGS-PrimerPlex installation without GUI and docker image. First, you need to download NGS-PrimerPlex github files.
Save it to a desired folder and unzip. You will get the folder NGS-PrimerPlex-master. Next steps you will need to do in the command-line (terminal in linux).
Run script of automatic installation (install_for_linux.sh). It requires admin right, so run it with sudo:
This script will install all necessary dependencies. Now you need to get reference genome (if you still don't have). It can be the reference genome of any organism. Ideally, it should be the reference genome with good annotation, e.g. for E. coli there are many genomes available, but not all of them have good annotation. However, annotation will influence only the process of automatic target region extraction. Other procedures can be carried out with any reference genome or even with any FASTA-file that you would like to use as reference (e.g. gene sequences). To prepare human reference genome download it in your browser:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
Then download tool twoBitToFa, that will convert 2bit file to FASTA-file:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa
And start the conversion (open in terminal folder, where the reference and the tool are located), after which index it with BWA:
twoBitToFa hg19.2bit ucsc.hg19.fa
bwa index ucsc.hg19.fasta
NCBI Genome database. Each GenBank-file should be named as this chromosome is called in the reference genome FASTA-file or as it is ordered in the reference FASTA-file. For example, for the above hg19 version chromosome 1 GenBank-file can be named as chr1.gb or 2.gb (because in the reference genome chrM is written as the 1st chromosome and chr1 as the 2nd). For example, hg19 reference FASTA-file has the following names of chromosomes (you can look at it with less ucsc.hg19.fasta
):
Both processes take much time. If you want to automatically extract target genome regions, you will have to also download GenBank-files for each of chromosome for genome version that you are going to use, e.g. from Other chromosomes have the same format (chrM, chr1, chr2 etc.). So, open NCBI Genome and search for human:
Go to bottom of the page, there will be all files for each chromosome:
Open it one by one, and save the file as GenBank (full):
Call it chr1.gb. Repeat it for other chromosomes, naming them as they called in the ucsc.hg19.fasta file.
To check primers for crossing SNPs, download also VCF-file (bgzipped) with variation in the reference genome:
wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz
wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz.tbi
Now we have all necessary files and are ready to run the test script:
python3 /NGS-PrimerPlex/test.py
All of the tests should be completed successfully. If you met any errors, report about it in the Issues at the GitHub here, please.
Now, you will be able to start example primer design or your own list of genes:
python3 getGeneRegions.py -glf example_gene_list_file.txt -ref hg19/ -rf example_gene_list_file.regions.csv
python3 NGS_primerplex.py -regions example_gene_list_file.regions.csv -ref hg19/ucsc.hg19.fasta -blast -snps -dbsnp hg19/common_all_20180423_hg19.vcf.gz