index_three_errors_approximate_search - seqan/bench GitHub Wiki
Approximate search, three errors in hamming distance (Index)
- Category: Index
- Validator: Hamming Distance Validator
Description
Using a precomputed index, find all positions of the 50-mers with exactly 3 errors (hamming distance).
Input
- A precomputed index of a genome (i.e.,
data/genome.fa
) - File of 10 Million DNA reads of length 50 randomly sampled from the genome.
(i.e.,
data/genome.index.reads.length.50.fa
)
Output
For each read of the 10 Million reads, list all end positions in the genome (if there is none, omit it).
The output must be written into a file.
Example
For simplicity we assume 4-mers instead of 50-mers.
Genome:
position: 0 5 0 5
Genome : GCCGCGCGTCGTCGGTC
Reads:
> 1
GTCG
> 2
AAAA
> 3
CGCG
Output:
0: 6, 8, 4, 7, 9, 17, 14, 11, 15
2: 5, 6, 8, 16, 13, 10, 4, 17, 14, 11, 15
Note that 1:
was omitted (no match) and that the positions can be unordered.