db_013 - zhangjaycee/real_tech GitHub Wiki

索引:哈希和树(Hashing and Tree)

1. Hashing

首先哈希表可以分为静态哈希和动态哈希:

1.1 Static hashing[4]

Static hashing faces the conflict issue, which can be solved by "linear probing" or "chaining with separate lists".

  • linear probing on disk

When using hashing on disk, the IO unit of linear probing is disk blocks. When search for a key, we start at a block and proceeds linearly until the key is found or a block is not full.

  • chaining with separate lists

When using chained list hashing on disk, one disk block can store multiple keys. Thus when keys are small and blocks are big, the oveflows will be very rare and we can expect search for a key using one I/O.

2. Dynamic hashing

Hash bucket is the on-disk structure. Due to the slow disk access, several methods are proposed to dynamically extend the hash table. For example, "extendable hashing" by Knuth (1973), and "linear hashing" by Litwin (1978).

A drawback of dynamic hashing is that it demand one more indirection to find the on-disk address when searching a item.

  • Extendable hashing[1][2]

The main idea is to split a bucket when it is going to overflow, and the bit number used of the hash key will consequently grow.

  • Linear hashing[3][4]

"gradually increasing and decreasing the range of hash functions with the size of the set."


参考:

[1] 《数据库系统概念》第11章, p291

[2] 《数据结构与算法分析--c语言描述》, p125

[3] 《计算机算法引论》, p100

[4] R. Pagh, “Basic External Memory Data Structures,” Algorithms Mem. Hierarchies, pp. 14–35, 2003.

2. Tree

B-trees

B-Tree and its variants are widely used for on disk searching by DBMSs.

Radix-tree

基数树也成为Tire(tire来源于检索retrieval这个词,发音同tree)或者字典树、前缀树,每个内部节点只存目标元素的一部分,根到叶子的路径为目标的结果。不需要哈希函数的计算时间,方便可以范围统计,如果不存在可以提前终止。内核中page cache就用了这种索引结构的变种。

Other trees

RB-tree, like AVL-tree, is a kind of balanced searching tree.

Trie tree (radix tree for example), is a kind of indexing tree more friendly to memory compared with disk.[5]


[5] Alvarez, Victor, et al. "A comparison of adaptive radix trees and hash tables." Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 2015.

⚠️ **GitHub.com Fallback** ⚠️