Example of use GUI for D. melanogaster eye color genes - aakechin/NGS-PrimerPlex GitHub Wiki

Genome preparations

If you have already prepared it, skip this step. To prepare reference genome, go to NCBI Genome database, search for Drosophila melanogaster and click genome after words "Download sequences in FASTA format for".

image

Save the archive and unzip it. WARNING! All files that will be used in GUI-version should be placed in folders nearest to C:\ without any non-latin or non-standard (/,",",_,-,<,> etc) symbols. Otherwise, GUI will not work.

Then, go to the bottom of the page and open one by one links to genome chromosomes (NC_004354.4, NT_033779.5, NT_033778.4 etc.).

image

At the top of the opened pages (e.g. https://www.ncbi.nlm.nih.gov/nuccore/NC_004354.4 should be opened) click "Send to:" -> File -> Choose "GenBank (full) -> "Create File".

image

Each GenBank-file should be named as this chromosome is called in the reference genome FASTA-file. To know, how is it called in the reference genome file. Open the downloaded reference genome FASTA-file with WordPad and look at first words near ">". Other names can be found by searching for the ">" symbol:

image

For example, for the D. melanogaster chromosome X GenBank-file can be named as NC_004354.4.gb.

Finally, you should have a directory with the following files:

  1. GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna - FASTA-file with reference genome
  2. NC_004353.4.gb - chrX
  3. NT_033779.5.gb - chr2L
  4. NT_033778.4.gb - chr2R
  5. NT_037436.4.gb - chr3L
  6. NC_004353.4.gb - chr4
  7. NT_033777.3.gb - chr3R
  8. NC_024512.1.gb - chrY
  9. NC_024511.2.gb - chrMT

Extraction of genome regions

We chose four genes associated with eye color mutations: chocolate (VhaAC39-1), maroon (Vps16A), mahogany (CG12207), and red Malpighian tubules (CG13646). List of genes was taken from Paaqua Grant et al. 2016.

To create NGS-panel, go to "Extract regions" of GUI.

extract_regions_empty

Fill names of genes into the 1st column of the left table. To add new row, press "Add gene". After that, choose the directory where GenBank-files of the reference genome were saved. Choose reference genome FASTA-file and file for output.

extract_regions_filled

Press "Extract regions". "Starting extraction of genome regions..." should be shown. Wait until "NGS-PrimerPlex finished!" will be shown.

extract_regions_result

File with extracted regions will be in the chosen directory.

extract_regions_result_file

Open it in Microsoft Excel (or other similar programs). To look at it in the Microsoft Excel, open it, go to Data, and Split data onto columns. Change number of multiplexes for each region in 5th column from 1 to necessary number of multiplex reactions. This number can be estimated from the following facts: (1) total number of primer pairs (about 50-70 primer pairs can be joined together); (2) how many overlapping regions (overlapped primer pairs can not be joined); (3) size of amplicons (larger amplicons - less overlappings). Initially, we tried to sort primer pairs to 3 multiplex reactions, so change 1 in the 5th column from 1 to "1,2,3" (without quotes) and save it as a tab-delimited text-file:

extract_regions_result_file2

Changing default settings

Default settings are for DNA samples from formalin-fixed paraffin-embedded (FFPE) tissue specimens. To study germline mutations, like in this case, we can use amplicons of about 150 bp. So, go to "Settings" and change Minimal amplicon length to 130 bp, optimal and maximal - to 150 bp. Also, you can descrease number of primer pairs designed for 1 locus ("Multiplexing" section) to 5 (then the process will be faster). We can leave other values by default. If you want, you can save your settings to some file and use them in the future:

image

Running primer design

Go to "Design primers". Values for the file with regions and the reference genome file will be filled automatically from your choice while extracting genome regions. Turn off "Check for covering SNPs", if you don't have VCF-file with known SNPs for the organism or target genes. Fill left and right adapters with one of NGS-PrimerPlex examples or your own sequences (e.g. "ctctctatgggcagtcggtgatt" and "ctgcgtgtctccgactcag").

primer_design1

Press "Start". Initial primer design will take some time (upto 1 hour).

primer_design2

Most likely, it will show the message that for 1-3 regions primers could not be designed with the defined parameters.

image

Go back to "Settings" and try to increase Maximal primer length from 28 to 32.

image

Go to "Design primers" again and choose file with draft list of primers. It will let you not to design primers previously designed again. XLS-file with draft list of primers is located in the same directory where your input file is.

image Drosophila_choose_draft2

Press "Start". The process will finish quickly (during 1-3 minutes). And now the program goes through this step! Draft-file was automatically updated after the last run. Next steps will take about 30 minutes more. After that you will probably get 13 to 24 primer pairs that could not be distributed to 3 multiplex reactions. Now we can change the number of multiplexes (if it is convenient for you during further work with the panel). Or better variant is to make parameters of joining primers into one pool (-minmultdimerdg1 and -minmultdimerdg2) less stringent. So go to the Settings and change "min dG with linked 3'-ends" and "min dG without linked 3'-ends" from -6 to -7 and from -10 to -12:

image

Return to the "Design primers" and change run name, if you want to get new log-file and to save current primer sets:

image

Also you can change draft-file, because now you have the primers that went through filtering by the specificity. So, choose new draft-file that ends with "_after_specificity.xls" and start primer design again:

Drosophila_change_draft_after_specificity

After about 15 minutes, you will get 10 new combinations for which some primers couldn't be distributed to any pools but only 1-5 primer pairs. We can save these variants by changing run name again to "run3". When only 1-2 primer pairs couldn't be distributed we can manually look at the reasons in the file that ends with "_amplicons_multiplex_incompatibility.xls". And sometimes we can neglect these reasons (e.g. for some primer dimers). Sometimes we can try to run it again, and due to almost stochastic process of primer pair distribution into pools, the program may produce completed result. For example, in our case this strategy was successful (for 2 combinations, all primers were sorted into pools). The output will be in the file ended with "_info.xls" and containing corresponding combination number:

Drosophila_completed_primers

Now we can go to "Add adapters". The file with the 1st combination is already chosen (if it is necessary variant, leave it, if not, replace it), so we only need to choose file with adapter sequences (see examples of such files in the NGS-PrimerPlex package), and press "Add adapters to primers". That's all!

Other tips

When only several primers could not be sorted into necessary number of pools and you don't want to use less strict parameters, we can remove unsorted primer pair from one of combinations created and convert this file to draft primers file. To do it, open XLS-file of combination which name ends with "_info.xls" and where only 1-3 primer pairs couldn't be sorted to any of multiplexes. Remove row with primer pair with empty cell in the last column ("Designed_Multiplex"):

output_with_unsorted

Then, open NGS-PrimerPlex and go to "Convert to draft-primers". Choose file with removed row and press "Start":

convert_to_draft_primers

It is a very quick process. Go to Setting and change "Primer pairs to design for 1 locus" from 5 to 50. Go back to "Design primers" and change draft-file to the file created previously. Press "Start".