Lyndon Words - dvberkel/debruijn GitHub Wiki
A Lyndon Word is an aperiodic word that is the lexicographically smallest of its rotations.
Take the alphabet A = {0, 1} and the word 110010
over the alphabet. The rotations of 110010
are
110010, 100101, 001011, 010110, 101100, 011001
Listing the above words in lexicographical order we find a Lyndon word 001011
at the front of the list.
001011, 010110, 011001 100101, 101100, 110010
Generating Lyndon Words
We can generate the Lyndon words of length at most n in lexicographical order by using the following algorithm.
def LyndonWords(s,n):
"""Generate nonempty Lyndon words of length <= n over an s-symbol alphabet.
The words are generated in lexicographic order, using an algorithm from
J.-P. Duval, Theor. Comput. Sci. 1988, doi:10.1016/0304-3975(88)90113-2.
As shown by Berstel and Pocchiola, it takes constant average time
per generated word."""
w = [-1] # set up for first increment
while w:
w[-1] += 1 # increment the last non-z symbol
yield w
m = len(w)
while len(w) < n: # repeat word to fill exactly n syms
w.append(w[-m])
while w and w[-1] == s - 1: # delete trailing z's
w.pop()
When called with s = 2
and n = 3
it will yield the following elements in order.
[0], [0, 0, 1], [0, 1, 1], [1]
Application
If one concatenates all the Lyndon words, listed in lexicographic order, which divides a number n the result is a De Bruijn sequence.
For example by concatenating the above Lyndon words one obtains
00010111
Which is a B(2,3) De Bruijn sequence as can be seen from the following diagram.
00010111
000
001
010
101
011
111
0 11
00 1