index_one_error_approximate_search - seqan/bench GitHub Wiki

Approximate search, one error in hamming distance (Index)

Description

Using a precomputed index, find all positions of the 50-mers with exactly 1 error (hamming distance).

Input

  • A precomputed index of a genome (i.e., data/genome.fa)
  • File of 10 Million DNA reads of length 50 randomly sampled from the genome. (i.e., data/genome.index.reads.length.50.fa)

Output

For each read of the 10 Million reads, list all end positions in the genome (if there is none, omit it).

The output must be written into a file.

Example

For simplicity we assume 4-mers instead of 50-mers.

Genome:

position: 0    5    0    5
Genome  : GCCGCGCGTCGTCGGTC

Reads:

> 1
GTCG
> 2
AAAA
> 3
CGCG

Output:

0: 4, 14, 11
2: 6, 8

Note that 1: was omitted (no match) and that the positions can be unordered.