Walkthrough - MatthewHiggins2017/bioconda-PrimedRPA GitHub Wiki
PrimedRPA Walkthrough
Estimated Time: 10 minutes
The following walkthrough examples demonstrate the flexibility of the PrimedRPA software. In this tutorial, we shall attempt to identify primers to target the Human papillomavirus (HPV).
Example One
Overview
- Utilise parameters file
- Single sequence input file
- Generate sets of viable primers
Step 1
Prepare the necessary work environment as follows:
mkdir Walk_Through_RPA_Primers
cd ./Walk_Through_RPA_Primers
# Download HPV-126 Virus Genome From NCBI
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/896/435/GCF_000896435.1_ViralProj76727/GCF_000896435.1_ViralProj76727_genomic.fna.gz
gunzip GCF_000896435.1_ViralProj76727_genomic.fna.gz
# Download Parameters File
wget https://raw.githubusercontent.com/MatthewHiggins2017/bioconda-PrimedRPA/master/PrimedRPA_Parameters.txt
Edit the PrimedRPA_Parameters.txt parameters file to represent the following:
This parameters file will guide the PrimedRPA-based primer and probe design process. Please follow the instructions outlined below:
----####----
Important Note - Do not remove any of the “>” and write your input directly after this symbol.
----####----
Please define the reference name for this PrimedRPA run:
>HPV_Run_1
Please indicate if you would like to use a previously generated Alignment File: [NO or File path]
>NO
Please indicate if you would like to use the previously generated Binding Sites: [NO or File path]
>NO
Please enter the path, from your current working directory, to the input fasta file:
>GCF_000896435.1_ViralProj76727_genomic.fna
Please classify the contents of the input fasta file as one of the following options: [SS, MS, AMS]. Whereby:
SS = Single sequence
MS = Multiple unaligned sequences
MAS = Multiple aligned sequences
>SS
If multiple sequences are present in the input fasta file (Classification of MS or MAS), please indicate below the
percentage identity required for the primers and probes target binding sites:
>99
Please indicate if a primer identity anchor is required. [NO or length of anchor]
>NO
Desired primer length (This can be a range: 28-32 or fixed value: 32):
>32
Please state if you require a probe to be designed and if so what type [NO,EXO,NFO]
>NO
Desired probe length (This can be a range: 45-50 or fixed value: 50):
>50
Below please define your max amplicon length.
>300
Below please state the repeat nucleotide cut-off in bp (e.g. 5bp will exclude sequences containing GGGGG).
>5
Below please insert the minimum percentage GC content for primer/probe:
>30
Below please insert the maximum percentage GC content for primer/probe:
>70
Below please indicate the percentage match tolerance for primer-probe dimerisation and secondary structure formation:
>80
Please enter [No or Path to Background file] below to identify if you want to perform a background DNA binding check:
>NO
Below please insert the percentage background cross reactivity threshold:
>65
Below please indicate if you would like to implement a Background Hard Fail Filter [NO,YES]:
>NO
Please define the maximum number of sets you would like to identify:
>5
Please define the number of threads available:
>2
Blastn Cross Reactivity Search Settings [Basic or Advanced or Fast]
>Fast
Blastn Evalue
>1000
Step 2
Now the parameters file has been adjusted, we can begin our analysis via the following command:
PrimedRPA PrimedRPA_Parameters.txt
First, an alignment summary will be generated however, as we are only using a single sequence in this first example, we can ignore it for now.
HPV_Run_1_Alignment_Summary.csv
Next a file will be generated containing all of the potential oligo binding sites:
HPV_Run_1_PrimedRPA_Oligo_Binding_Sites.csv
Finally, the output file will be generated:
HPV_Run_1_Output_Sets.csv
On inspection of the output file the Max Dimerisation Score appears rather high for most sets.
Step 3
To obtain better candidates we can increase filter stringency and rerun the analysis. In addition, to save computational efficiency, we can load in the previously generated binding sites.
To achieve this, edit the parameters file as follows:
This parameters file will guide the PrimedRPA-based primer and probe design process. Please follow the instructions outlined below:
----####----
Important Note - Do not remove any of the “>” and write your input directly after this symbol.
----####----
Please define the reference name for this PrimedRPA run:
>HPV_Run_2
Please indicate if you would like to use a previously generated Alignment File: [NO or File path]
>NO
Please indicate if you would like to use the previously generated Binding Sites: [NO or File path]
>NO
Please enter the path, from your current working directory, to the input fasta file:
>GCF_000896435.1_ViralProj76727_genomic.fna
Please classify the contents of the input fasta file as one of the following options: [SS, MS, AMS]. Whereby:
SS = Single sequence
MS = Multiple unaligned sequences
MAS = Multiple aligned sequences
>SS
If multiple sequences are present in the input fasta file (Classification of MS or MAS), please indicate below the
percentage identity required for the primers and probes target binding sites:
>99
Please indicate if a primer identity anchor is required. [NO or length of anchor]
>NO
Desired primer length (This can be a range: 28-32 or fixed value: 32):
>32
Please state if you require a probe to be designed and if so what type [NO,EXO,NFO]
>NO
Desired probe length (This can be a range: 45-50 or fixed value: 50):
>50
Below please define your max amplicon length.
>300
Below please state the repeat nucleotide cut-off in bp (e.g. 5bp will exclude sequences containing GGGGG).
>5
Below please insert the minimum percentage GC content for primer/probe:
>30
Below please insert the maximum percentage GC content for primer/probe:
>70
Below please indicate the percentage match tolerance for primer-probe dimerisation and secondary structure formation:
>80
Please enter [No or Path to Background file] below to identify if you want to perform a background DNA binding check:
>NO
Below please insert the percentage background cross reactivity threshold:
>35
Below please indicate if you would like to implement a Background Hard Fail Filter [NO,YES]:
>NO
Please define the maximum number of sets you would like to identify:
>5
Please define the number of threads available:
>2
Blastn Cross Reactivity Search Settings [Basic or Advanced or Fast]
>Fast
Blastn Evalue
>1000
Then re-run the analysis:
PrimedRPA PrimedRPA_Parameters.txt
This time only a single output file will be generated (as shown below) as we have utilised the binding sites from the previous run. In addition, all candidates look more suitable to carry forward to be tested in the lab due to their lower dimerisation scores.
HPV_Run_2_Output_Sets.csv
Example Two
Overview
- Utilise command line parameters
- Generate primers sets
- Run a cross reactivity check
Step 1
Due to an unforeseen labelling error we have clinical samples that could either contain HIV or HPV. To be able to distinguish the HPV containing samples, we want to design primers which have no HIV cross-reactivity potential.
# Download HIV-2 Genome From NCBI
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/098/135/GCA_003098135.1_ASM309813v1/GCA_003098135.1_ASM309813v1_genomic.fna.gz
gunzip GCA_003098135.1_ASM309813v1_genomic.fna.gz
Step 2
This time we are going to perform analysis using only the command line options. In addition, as we are now adding in a background binding check we have to regenerate the binding sites (please see Parameters Options for more information). However, as the input file has not changed, we can use the alignment file generated previously (HPV_Run_1).
PrimedRPA --RunID HPV_Run_3 --PriorAlign HPV_Run_1_Alignment_Summary.csv --PrimerLength 32 --AmpliconSizeLimit 500 --NucleotideRepeatLimit 5 --MinGC 30 --MaxGC 70 --DimerisationThresh 40 --BackgroundCheck GCA_003098135.1_ASM309813v1_genomic.fna --CrossReactivityThresh 40
Please look through HPV_Run_3_Output_Sets.csv to inspect the potential candidates generated. The addition column, Max Background Cross Reactivity Score, is the highest max cross reactivity score out of all relative oligos within a given candidate set.
Also, as we have included a cross-reactivity check, the blastn output files for all oligo binding sites, which fell below the threshold, are stored in the following location:
./GCA_003098135_Blastn_DB_PrimedRPA/HIV_Run_3/<Oligo_Binding_Site_Sequence>_Blastn_Output.csv
Example Three
Overview
- Utilise parameters file
- Generate primers & Exo probes
Step 1
Now we want to quantify the concentration of HPV DNA in our samples; to do this, we will need fluorescent Exo probes. Therefore, we need to re-run the analysis as follows:
Please alter the parameters file to include the Exo probe preference as follows:
This parameters file will guide the PrimedRPA-based primer and probe design process. Please follow the instructions outlined below:
----####----
Important Note - Do not remove any of the “>” and write your input directly after this symbol.
----####----
Please define the reference name for this PrimedRPA run:
>HPV_Run_4
Please indicate if you would like to use a previously generated Alignment File: [NO or File path]
>HPV_Run_1_Alignment_Summary.csv
Please indicate if you would like to use the previously generated Binding Sites: [NO or File path]
>NO
Please enter the path, from your current working directory, to the input fasta file:
>GCF_000896435.1_ViralProj76727_genomic.fna
Please classify the contents of the input fasta file as one of the following options: [SS, MS, AMS]. Whereby:
SS = Single sequence
MS = Multiple unaligned sequences
MAS = Multiple aligned sequences
>SS
If multiple sequences are present in the input fasta file (Classification of MS or MAS), please indicate below the
percentage identity required for the primers and probes target binding sites:
>99
Please indicate if a primer identity anchor is required. [NO or length of anchor]
>NO
Desired primer length (This can be a range: 28-32 or fixed value: 32):
>32
Please state if you require a probe to be designed and if so what type [NO,EXO,NFO]
>EXO
Desired probe length (This can be a range: 45-50 or fixed value: 50):
>50
Below please define your max amplicon length.
>300
Below please state the repeat nucleotide cut-off in bp (e.g. 5bp will exclude sequences containing GGGGG).
>5
Below please insert the minimum percentage GC content for primer/probe:
>30
Below please insert the maximum percentage GC content for primer/probe:
>70
Below please indicate the percentage match tolerance for primer-probe dimerisation and secondary structure formation:
>80
Please enter [No or Path to Background file] below to identify if you want to perform a background DNA binding check:
>NO
Below please insert the percentage background cross reactivity threshold:
>65
Below please indicate if you would like to implement a Background Hard Fail Filter [NO,YES]:
>NO
Please define the maximum number of sets you would like to identify:
>5
Please define the number of threads available:
>2
Blastn Cross Reactivity Search Settings [Basic or Advanced or Fast]
>Fast
Blastn Evalue
>1000
Again, trigger analysis with the following command:
PrimedRPA PrimedRPA_Parameters
Inspect the output file (below) and you will see potential candidate sets. For any of the probe sequences generated, two thymine residues will be situated approximately 2/3rds into the probe which can be exchange for the fluorescent marker and quencher respectively.
HPV_Run_4_Output_Sets.csv
Example Four
Overview
- Utilise parameters file
- Input multiple unalinged fasta sequence.
- Generate primers & Exo probes