ParameterDescriptions - MatthewHiggins2017/bioconda-PrimedRPA GitHub Wiki
Estimated Time: 5 minutes
PrimedRPA has now been improved to enable parameters to be parsed via two alternative mechanisms:
1. A parameters text file:
PrimedRPA PrimedRPA_Parameters.txt
2. Command line variables:
PrimedRPA --RunID Run_1 --PriorAlign Run_0_Alignment_Summary.csv --PrimerLength 32 --AmpliconSizeLimit 500
## Parameter Summary Breakdown
Parameter | Description | Default |
---|---|---|
RunID | The associated Run ID given to any analysis. This will be used to name the output files generated. | N/A |
PriorAlign | Options: NO - Don't use a previously generated alignment file. <File Path> - Path to a previously generated alignment file. (Explained in more detail below) |
NO |
PriorBindingSite | Options: NO - Dont use a previously generated binding sites file. <File Path> - Path to a previously generated bindig sites file. (Explained in more detail below) |
NO |
InputFile | The path to the target fasta file | N/A |
InputFileType | The contents of the input fasta file can be classified as: SS - A single sequence MS - Multiple sequences (unaligned) MAS - Multiple aligned sequences |
SS |
IdentityThreshold | The binding site specific identity threshold. (Explained in more detail below) | 99 |
ConservedAnchor | The number of nucleotides from the 3' primer terminus which require an 100% identity score. | 3 |
PrimerLength | The desired primer length / range e.g 30 or 28-32 | 30 |
ProbeRequired | The options are as follows: NO - No probe required EXO - Exo probe required NFO - Nfo probe required |
NO |
ProbeLength | The desired probe length / range e.g 50 or 45-55 | 50 |
AmpliconSizeLimit | The upper limit for the amplicon length. | 500 |
NucleotideRepeatLimit | The number of tolerated single nucleotide repeats | 5 |
MinGC | Minimum GC Content % | 30 |
MaxGC | Maximum GC Content % | 70 |
DimerisationThresh | The number of sites in an oligo which could cause dimerisation, relative to the sequence length, expressed as a percentage. (Explained in more detail below) | 40 |
BackgroundCheck | Options: NO - No background binding check required. <File Path> - Path to a single fasta file containing all the background sequences which all potential binding sites will be checked against. |
NO |
CrossReactivityThresh | The percentage threshold between a given binding site in the target and a similar binding site in potential background sequences provided. (Explained in more detail below) | 65 |
HardCrossReactFilter | If you would like to add on a hard fail option to the cross-reactivity search. Options [YES:NO] (Explained in more detail below) | NO |
MaxSets | The max number of primer-probe sets to be exported. | 100 |
Threads | The number of threads available to parallelise the primer/probe search process over. | 1 |
BackgroundSearchSensitivity | An option to alter the Blastn settings which will impact the sensitivity and speed of the cross reactivity search. Speed: Fast > Basic > Advanced. Sensitivity: Advanced > Basic > Fast |
Basic |
This option can be used if you are planning to rerun analysis on the same target (InputFile).
This option can be used if you want to re-run the analysis for a given target whilst improving parameter stringency. For example, reducing the DimerisationThresh parameter from 40 to 15.
However, it is necessary to regenerate the binding sites file under the following conditions:
- If the ProbeRequired parameter is altered.
- If the BackgroundCheck parameter is altered.
- If the InputFile parameter is altered.
As explained in Output File Descriptions, if the input file contains multiple sequences, each index position in the alignment will be assigned an identity score. Each oligo binding site will be assigned an overall score as the mean of the identity scores of the index positions it covers. For example:
Abundance | Index Position | Nucleotide |
---|---|---|
1 | 0 | A |
1 | 1 | C |
1 | 2 | G |
1 | 3 | A |
0.75 | 4 | A |
1 | 5 | A |
1 | 6 | A |
1 | 7 | T |
1 | 8 | A |
0.75 | 9 | T |
1 | 10 | A |
1 | 11 | G |
1 | 12 | G |
An oligo whose binding site covers index positions 0-12, will have an Identity Score of 0.962. Therefore, under the default IdentityThreshold parameter, this oligo binding site will be excluded (0.962x100 < 99).
If a cross-reactivity check is required, the specified fasta file, containing all background sequences, is converted into a Blastn Database. Each oligo binding site is then checked against this database and cross-reactivity score generated for each hit as follows:
Cross Reactivity Score (CRS) = ((LA * (PI/100))/LQ) * 100
LA = Length of Oligo Binding Site Alignment
PI = Percentage Identity
LQ = Length of Oligo Binding Site
Each hit is then ranked according to its cross-reactivity score, and the maximum cross reactivity score identified for a given oligo binding site. If this maximum score is above the CrossReactivityThresh parameter, the binding site is excluded.
This parameter was added with the goal of providing a stringent cross-reactivity filter. Research has shown that primers as short as 18 nucleotides can result in successful RPA-based amplification and that mismatches present within the primer binding region can be tolerated if located away from the 3' primer terminus. Ref: DOI: 10.1128/mBio.00135-13 & Utilising Short Primers. Therefore, the hard-fail cross reactivity filter system follows the logic described below:
1. - For each cross-reactivity hit, complementarity is assess 22bp downstream from the 5' end and 22bp upstream from the 3' end. Please see below:
AAAACAACGTCGGCCCCAAGGTTTACCCAATAA - Oligo Binding Site
||||||-||-||-||||||||||||||||||||
TTTTGTCGCGGCTGGGGTTCCAAATGGGTTATT - Background Sequence
5': AAAACAACGTCGGCCCCAAGGT
3': GGCCCCAAGGTTTACCCAATAA
2. - A score is the derived for each direction of potential binding based on the number of complementary (+1) and mismatch sites (-1). A weighting system is implemented for complementary sites towards the binding site terminus as follows:
Position | Weighting |
---|---|
Terminus (t) | 3 |
t-1 or t+1 | 3 |
t-2 or t+2 | 3 |
AAAACAACGTCGGCCCCAAGGT 5'Score = 3+2+1.5+(1x16)+(3x-1) = 19.5
GGCCCCAAGGTTTACCCAATAA 3' Score = 3+2+1.5+(1x18)+(1x-1) = 23.5
3. - If the overall score, for either direction, is greater than or equal to 21.5 the olgio binding site is marked as a Hard Fail and excluded from downstream analysis. Therefore, the example above would be marked as a hard fail due to the 3' cross-reactivity score being greater than 21.5.
This parameter is used in two stages:
1. - To assess the potential for individual oligo's to self dimerise. For example:
CAATAAGAAATATTTCCAAAACTTAAGACCGC
-|--|---||-|-||---||-|-||---|--|---
CGCCAGAATTCAAAACCTTTATAAAGAATAAC
This oligo is complementary at 14 sites out of 32. Therefore, it would be given a dimerisation score of (14/32)*100 = 43.75. This is above the default threshold of 40 and so this oligo would be excluded
2. - To assess if there is the potential for primer/probes within any given set to dimerise. The example below shows the dimerisation potential between a forward primer and probe. As shown, there are 11 sites which are complementary. Also, as both sequences are different lengths, we take the length of the shortest sequence. Therefore, the dimerisation score is (11/32)*100=34.34% which falls below the default threshold.
TTGTTTTGCCTGCACCTTTGCTTTGTGAGGAG
-------------------||--------||||-|-|-----||----|--------------------------------
TTTATTGTAATCGTGGAGTCAGGCTTTCTGTGGTAGCATCTGACGGAGCA
Please see the table below for the specific Blastn parameters utilised under each setting:
Option | Word Size | Gap Open | Gap Extend | Reward | Penalty |
---|---|---|---|---|---|
Fast | 7 | 5 | 2 | 1 | -3 |
Basic | 4 | 5 | 2 | 1 | -2 |
Advanced | 4 | 5 | 2 | 1 | -1 |
You can adjust the name of the parameters file by adding to the start, i.e. Run_One_PrimedRPA_Parameters.txt. However, the parameters file will not be recognised if the following string is altered: 'PrimedRPA_Parameters.txt'
To see help for each command line variable run the following command:
PrimedRPA --help