ParameterDescriptions - MatthewHiggins2017/bioconda-PrimedRPA GitHub Wiki

PrimedRPA Parameter Breakdown

Estimated Time: 5 minutes

PrimedRPA has now been improved to enable parameters to be parsed via two alternative mechanisms:

1. A parameters text file:

PrimedRPA PrimedRPA_Parameters.txt

2. Command line variables:

PrimedRPA --RunID Run_1 --PriorAlign Run_0_Alignment_Summary.csv --PrimerLength 32 --AmpliconSizeLimit 500  

## Parameter Summary Breakdown

Parameter Description Default
RunID The associated Run ID given to any analysis. This will be used to name the output files generated. N/A
PriorAlign Options:

NO - Don't use a previously generated alignment file.

<File Path> - Path to a previously generated alignment file.

(Explained in more detail below)
NO
PriorBindingSite Options:

NO - Dont use a previously generated binding sites file.

<File Path> - Path to a previously generated bindig sites file.

(Explained in more detail below)
NO
InputFile The path to the target fasta file N/A
InputFileType The contents of the input fasta file can be classified as:

SS - A single sequence
MS - Multiple sequences (unaligned)
MAS - Multiple aligned sequences
SS
IdentityThreshold The binding site specific identity threshold. (Explained in more detail below) 99
ConservedAnchor The number of nucleotides from the 3' primer terminus which require an 100% identity score. 3
PrimerLength The desired primer length / range e.g 30 or 28-32 30
ProbeRequired The options are as follows:

NO - No probe required
EXO - Exo probe required
NFO - Nfo probe required

NO
ProbeLength The desired probe length / range e.g 50 or 45-55 50
AmpliconSizeLimit The upper limit for the amplicon length. 500
NucleotideRepeatLimit The number of tolerated single nucleotide repeats 5
MinGC Minimum GC Content % 30
MaxGC Maximum GC Content % 70
DimerisationThresh The number of sites in an oligo which could cause dimerisation, relative to the sequence length, expressed as a percentage. (Explained in more detail below) 40
BackgroundCheck Options:

NO - No background binding check required.

<File Path> - Path to a single fasta file containing all the background sequences which all potential binding sites will be checked against.

NO
CrossReactivityThresh The percentage threshold between a given binding site in the target and a similar binding site in potential background sequences provided. (Explained in more detail below) 65
HardCrossReactFilter If you would like to add on a hard fail option to the cross-reactivity search. Options [YES:NO] (Explained in more detail below) NO
MaxSets The max number of primer-probe sets to be exported. 100
Threads The number of threads available to parallelise the primer/probe search process over. 1
BackgroundSearchSensitivity An option to alter the Blastn settings which will impact the sensitivity and speed of the cross reactivity search.

Speed: Fast > Basic > Advanced.
Sensitivity: Advanced > Basic > Fast
Basic

Detailed Parameter Breakdown

PriorAlign

This option can be used if you are planning to rerun analysis on the same target (InputFile).

PriorBindingSite

This option can be used if you want to re-run the analysis for a given target whilst improving parameter stringency. For example, reducing the DimerisationThresh parameter from 40 to 15.

However, it is necessary to regenerate the binding sites file under the following conditions:

  • If the ProbeRequired parameter is altered.
  • If the BackgroundCheck parameter is altered.
  • If the InputFile parameter is altered.

IdentityThreshold

As explained in Output File Descriptions, if the input file contains multiple sequences, each index position in the alignment will be assigned an identity score. Each oligo binding site will be assigned an overall score as the mean of the identity scores of the index positions it covers. For example:

Abundance Index Position Nucleotide
1 0 A
1 1 C
1 2 G
1 3 A
0.75 4 A
1 5 A
1 6 A
1 7 T
1 8 A
0.75 9 T
1 10 A
1 11 G
1 12 G

An oligo whose binding site covers index positions 0-12, will have an Identity Score of 0.962. Therefore, under the default IdentityThreshold parameter, this oligo binding site will be excluded (0.962x100 < 99).

CrossReactivityThresh

If a cross-reactivity check is required, the specified fasta file, containing all background sequences, is converted into a Blastn Database. Each oligo binding site is then checked against this database and cross-reactivity score generated for each hit as follows:

Cross Reactivity Score (CRS) = ((LA * (PI/100))/LQ) * 100

LA = Length of Oligo Binding Site Alignment
PI = Percentage Identity
LQ = Length of Oligo Binding Site

Each hit is then ranked according to its cross-reactivity score, and the maximum cross reactivity score identified for a given oligo binding site. If this maximum score is above the CrossReactivityThresh parameter, the binding site is excluded.

HardCrossReactFilter

This parameter was added with the goal of providing a stringent cross-reactivity filter. Research has shown that primers as short as 18 nucleotides can result in successful RPA-based amplification and that mismatches present within the primer binding region can be tolerated if located away from the 3' primer terminus. Ref: DOI: 10.1128/mBio.00135-13 & Utilising Short Primers. Therefore, the hard-fail cross reactivity filter system follows the logic described below:

1. - For each cross-reactivity hit, complementarity is assess 22bp downstream from the 5' end and 22bp upstream from the 3' end. Please see below:


AAAACAACGTCGGCCCCAAGGTTTACCCAATAA  -  Oligo Binding Site                              
||||||-||-||-||||||||||||||||||||
TTTTGTCGCGGCTGGGGTTCCAAATGGGTTATT   - Background Sequence                             


5': AAAACAACGTCGGCCCCAAGGT

3': GGCCCCAAGGTTTACCCAATAA

2. - A score is the derived for each direction of potential binding based on the number of complementary (+1) and mismatch sites (-1). A weighting system is implemented for complementary sites towards the binding site terminus as follows:

Position Weighting
Terminus (t) 3
t-1 or t+1 3
t-2 or t+2 3

AAAACAACGTCGGCCCCAAGGT 5'Score = 3+2+1.5+(1x16)+(3x-1) = 19.5

GGCCCCAAGGTTTACCCAATAA 3' Score = 3+2+1.5+(1x18)+(1x-1) = 23.5


3. - If the overall score, for either direction, is greater than or equal to 21.5 the olgio binding site is marked as a Hard Fail and excluded from downstream analysis. Therefore, the example above would be marked as a hard fail due to the 3' cross-reactivity score being greater than 21.5.

DimerisationThresh

This parameter is used in two stages:

1. - To assess the potential for individual oligo's to self dimerise. For example:

CAATAAGAAATATTTCCAAAACTTAAGACCGC                              
-|--|---||-|-||---||-|-||---|--|---
CGCCAGAATTCAAAACCTTTATAAAGAATAAC                  

This oligo is complementary at 14 sites out of 32. Therefore, it would be given a dimerisation score of (14/32)*100 = 43.75. This is above the default threshold of 40 and so this oligo would be excluded

2. - To assess if there is the potential for primer/probes within any given set to dimerise. The example below shows the dimerisation potential between a forward primer and probe. As shown, there are 11 sites which are complementary. Also, as both sequences are different lengths, we take the length of the shortest sequence. Therefore, the dimerisation score is (11/32)*100=34.34% which falls below the default threshold.

                 TTGTTTTGCCTGCACCTTTGCTTTGTGAGGAG                                
-------------------||--------||||-|-|-----||----|--------------------------------
TTTATTGTAATCGTGGAGTCAGGCTTTCTGTGGTAGCATCTGACGGAGCA                               

BackgroundSearchSensitivity

Please see the table below for the specific Blastn parameters utilised under each setting:

Option Word Size Gap Open Gap Extend Reward Penalty
Fast 7 5 2 1 -3
Basic 4 5 2 1 -2
Advanced 4 5 2 1 -1

Additional Notes

You can adjust the name of the parameters file by adding to the start, i.e. Run_One_PrimedRPA_Parameters.txt. However, the parameters file will not be recognised if the following string is altered: 'PrimedRPA_Parameters.txt'

To see help for each command line variable run the following command:

PrimedRPA --help
⚠️ **GitHub.com Fallback** ⚠️