kmers_10mer_counting - seqan/bench GitHub Wiki
Counting 10-mers (Counting k-mers)
- Category: Counting k-mers
- Validator: Counting k-mers Validator
Description
In a given text count all 10-mers in the text.
Input
- A text (i.e.,
data/genome.fa
)
Output
For each 10-mer, give the first occurrence in the text (start position in the text) and the number of occurrences in the text.
To limit the output, omit all 10-mers that occur less than 5 times.
The output must be written into a file.
Example
For simplicity we assume 4-mers instead of 10-mers.
Genome:
position: 0 5 0 5 0
Genome : AAAAAAAAAGCGCGCGCGCGCTTTA
Output:
0: 6
9: 5
Explained Output:
0 (AAAA): 6 // first time AAAA occurred
1 (AAAA): 6 // second time AAAA occurred -> omit
[...]
6 (AAAG): 1 // first time AAAG occurred, but below 5 -> omit
7 (AAGC): 1 // first time AAGC occurred, but below 5 -> omit
8 (AGCG): 1 // first time AGCG occurred, but below 5 -> omit
9 (GCGC): 5 // first time GCGC occurred
10 (CGCG): 4 // first time CGCG occurred, but below 5 -> omit
11 (GCGC): 5 // second time GCGC occurred -> omit
12 (CGCG): 4 // second time CGCG occurred and below 5 -> omit
[...]
18 (CGCT): 1 // first time CGCT occurred, but below 5 -> omit
19 (GCTT): 1 // first time GCTT occurred, but below 5 -> omit
20 (CTTT): 1 // first time CTTT occurred, but below 5 -> omit
21 (TTTA): 1 // first time TTTA occurred, but below 5 -> omit