Lyndon Words - dvberkel/debruijn GitHub Wiki

A Lyndon Word is an aperiodic word that is the lexicographically smallest of its rotations.

Take the alphabet A = {0, 1} and the word 110010 over the alphabet. The rotations of 110010 are

110010, 100101, 001011, 010110, 101100, 011001

Listing the above words in lexicographical order we find a Lyndon word 001011 at the front of the list.

001011, 010110, 011001 100101, 101100, 110010

Generating Lyndon Words

We can generate the Lyndon words of length at most n in lexicographical order by using the following algorithm.

    def LyndonWords(s,n):
        """Generate nonempty Lyndon words of length <= n over an s-symbol alphabet.
    The words are generated in lexicographic order, using an algorithm from
    J.-P. Duval, Theor. Comput. Sci. 1988, doi:10.1016/0304-3975(88)90113-2.
    As shown by Berstel and Pocchiola, it takes constant average time
    per generated word."""
        w = [-1] # set up for first increment
        while w:
            w[-1] += 1 # increment the last non-z symbol
            yield w
            m = len(w)
            while len(w) < n: # repeat word to fill exactly n syms
                w.append(w[-m])
            while w and w[-1] == s - 1: # delete trailing z's
                w.pop()

When called with s = 2 and n = 3 it will yield the following elements in order.

[0], [0, 0, 1], [0, 1, 1], [1]

Application

If one concatenates all the Lyndon words, listed in lexicographic order, which divides a number n the result is a De Bruijn sequence.

For example by concatenating the above Lyndon words one obtains

00010111

Which is a B(2,3) De Bruijn sequence as can be seen from the following diagram.

00010111
000
 001
  010
   101
    011
     111
0     11
00     1