index_three_errors_approximate_search - seqan/bench GitHub Wiki

Approximate search, three errors in hamming distance (Index)

Description

Using a precomputed index, find all positions of the 50-mers with exactly 3 errors (hamming distance).

Input

  • A precomputed index of a genome (i.e., data/genome.fa)
  • File of 10 Million DNA reads of length 50 randomly sampled from the genome. (i.e., data/genome.index.reads.length.50.fa)

Output

For each read of the 10 Million reads, list all end positions in the genome (if there is none, omit it).

The output must be written into a file.

Example

For simplicity we assume 4-mers instead of 50-mers.

Genome:

position: 0    5    0    5
Genome  : GCCGCGCGTCGTCGGTC

Reads:

> 1
GTCG
> 2
AAAA
> 3
CGCG

Output:

0: 6, 8, 4, 7, 9, 17, 14, 11, 15
2: 5, 6, 8, 16, 13, 10, 4, 17, 14, 11, 15

Note that 1: was omitted (no match) and that the positions can be unordered.