Simulating the SELEX procedure - drivenbyentropy/aptasuite GitHub Wiki
AptaSim is a program, aimed at realistically recreating the selection process during SELEX using error-prone PCR. For our simulation, we represent a pool as a set of sequences in which each sequence is attributed with a count, representing its frequency, and a value between 0 and 100 simulating the binding affinity to a putative target. Given an initial pool, we then perform a user-defined number of iterations comprising of target affine selection followed by error prone amplification. The remaining sequences after the selection stage represent the sequenced portion of HT-SELEX and are stored for further analysis.
Initial Pool Generation
To allow for the inclusion of existing biases such as the base composition and nucleotide dependencies of a pool originating from an in-vitro SELEX experiment, the input set of sequences for the simulation can be generated based on a first order Markov Chain that is trained from real data and that captures the conditional probabilities of randomly selecting one nucleotide given the choice of the previous.
Target Affine Sampling
The sampling step simulates incubation, binding, partitioning, and washing of a selection cycle during a SELEX experiment. Assuming enriched and target affine species to have a higher probability of selection, we sample, without replacement, a user defined number percentage of the current pool according to the distribution of the sequence counts and accept a sequence with the probability corresponding to its binding affinity.
Amplification
In order to restore the pool to its original size, we simulate a number of PCR cycles in which the amplification efficiency e as well as the mutation probability p can be specified. The number of required PCR cycles is automatically computed. In each PCR cycle, every aptamer is then subject to amplification as many times as its current count and in dependency of the specified probability of amplification and based on the mutation probability, the sequence is either duplicated or a mutant is introduced into the pool.
Graphical User Interface
Use the New Simulated Experiment
option in the File
menu to open the wizard which will guide you through the creation of a simulated data set.
Command Line Interface
AptaSIM can be called with the following command within AptaSUITE:
java -jar aptasuite.jar -simulate -config /path/to/configuration/file
Mandatory Configuration File Parameters
By default, AptaSIM will generate half a million sequences with a 40nt randomized region generating sequences with equal nucleotide distribution and perform as many selection cycles as specified in SelectionCycle.*
(see Configuration File section).
If instead, a Markov Model should be trained, a fastq file must be specified as such
# Path to the sequences used to train the markov model in fastq format
Aptasim.HmmFile = /path/to/markov/model
Default Parameters
All other aspects of the selection can be modified with the following parameters
# The degree of the Markov model. The larger this value, the more higher-order
# dependencies will be captured from the training data
Aptasim.HmmDegree = 2
# Length of the randomized region in the generated aptamers
Aptasim.RandomizedRegionSize = 40
# Number of (unique) sequences in the initial pool
Aptasim.NumberOfSequences = 500000
# Number of high affinity sequences in the initial pool
Aptasim.NumberOfSeeds = 100
#The minimal affinity for seed sequences (INT range: 0-100)
Aptasim.MinSeedAffinity = 80
# Maximal count of remaining sequences
Aptasim.MaxSequenceCount = 10
# The maximal sequence affinity for non-seeds (INT range: 0-100)
Aptasim.MaxSequenceAffinity = 25
# If no training data is specified, create pool based on this distribution
# (order A,C,G,T)
Aptasim.NucleotideDistribution = 0.25, 0.25, 0.25, 0.25
# The percentage of sequences that remain after selection (DOUBLE range: 0-1)
Aptasim.SelectionPercentage = 0.20
# Mutation rates for individual nucleotides (order A,C,G,T)
Aptasim.BaseMutationRates = 0.25, 0.25, 0.25, 0.25
# Mutation probability during PCR (DOUBLE range: 0-1)
Aptasim.MutationProbability = 0.05
# PCR amplification efficiency (DOUBLE range: 0-1)
Aptasim.AmplificationEfficiency = 0.995