Definition of PATRICIA - Sebbyastian/patricia GitHub Wiki

There seems to be a common definition which has become a synonym for "radix trie". However, according to the following sources, the term "radix trie" doesn't seem to sufficiently describe properties of the underlying data structure that would justify this synonymous regard. A PATRICIA trie may be a radix trie, but not all radix tries are PATRICIA tries.

  • D. R. Morrison published the original paper describing PATRICIA; "PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric" can be found in the Journal of the ACM (1968), Volume 15, Issue 4, Pages 514-534.
  • D. E. Knuth is regarded to be an expert on algorithms and data structures. His book, TAOCP volume 3, describes PATRICIA tries in section 6.3.
  • A. G. McDowell has a created a web page entitled Patricia Variations, describing variations to PATRICIA.

These three sources seem to agree upon a common set of properties. Of those properties, the most significant to me is described perfectly by Morrison: "each new end brings into the library with it exactly one new branch, and each branch has exactly two exits". What this means is that in a trie containing the keys "tea", "ted" and "ten" there is no internal node for "te". This property is and always will be strictly adhered to in this project, for the sake of simplicity and efficiency.

Each branch stores the index of a bit to be tested, and the index increases as the depth increases. The root node of the trie described above would be identified by the left-most difference, coloured {\color[HTML]{FF0000} red} in the corresponding ASCII binary notation below.

  • "tea" -- "01110100 01100101 0110{\color[HTML]{FF0000} 0}001"
  • "ted" -- "01110100 01100101 0110{\color[HTML]{FF0000} 0}100"
  • "ten" -- "01110100 01100101 0110{\color[HTML]{FF0000} 1}110"

According to Knuth and McDowell, in such a traditional PATRICIA trie there can be no "te" node at all; it would be illegal to add such a node to the trie described above, because no key can be a prefix of another. This is one of the two properties violated by this project that I'm aware of. To permit this, I've used a variation similar to one described by McDowell in his page on PATRICIA variations linked to above.

As a result of this variation, some edges lead to nodes that have uncommon prefix bits. Unfortunately this violation seems unavoidable. I consider this unfortunate because the trie can no longer be iterated in order.

To be continued...