kmers_10mer_counting - seqan/bench GitHub Wiki

Counting 10-mers (Counting k-mers)

Description

In a given text count all 10-mers in the text.

Input

  • A text (i.e., data/genome.fa)

Output

For each 10-mer, give the first occurrence in the text (start position in the text) and the number of occurrences in the text.

To limit the output, omit all 10-mers that occur less than 5 times.

The output must be written into a file.

Example

For simplicity we assume 4-mers instead of 10-mers.

Genome:

position: 0    5    0    5    0
Genome  : AAAAAAAAAGCGCGCGCGCGCTTTA

Output:

0: 6
9: 5

Explained Output:

0 (AAAA): 6 // first time AAAA occurred
1 (AAAA): 6 // second time AAAA occurred -> omit
[...]
6 (AAAG): 1 // first time AAAG occurred, but below 5 -> omit
7 (AAGC): 1 // first time AAGC occurred, but below 5 -> omit
8 (AGCG): 1 // first time AGCG occurred, but below 5 -> omit
9 (GCGC): 5 // first time GCGC occurred
10 (CGCG): 4 // first time CGCG occurred, but below 5 -> omit
11 (GCGC): 5 // second time GCGC occurred -> omit
12 (CGCG): 4 // second time CGCG occurred and below 5 -> omit
[...]
18 (CGCT): 1 // first time CGCT occurred, but below 5 -> omit
19 (GCTT): 1 // first time GCTT occurred, but below 5 -> omit
20 (CTTT): 1 // first time CTTT occurred, but below 5 -> omit
21 (TTTA): 1 // first time TTTA occurred, but below 5 -> omit