B‐Tree - tenji/ks GitHub Wiki
B 树是一种自平衡(self-balancing)树数据结构,它维护排序的数据并允许在对数时间内搜索、顺序访问、插入和删除。B 树扩展了二叉搜索树,允许节点具有两个以上的子节点。与其他自平衡二叉搜索树不同,B 树非常适合读写相对较大数据块的存储系统,例如数据库和文件系统。
B 树和 B- 树的区别是什么?
B 树和 B- 树指的都是 B 树,之所以会有 B- 树这么个中文翻译,是因为 B 树的英文是 B-Tree,所以翻译成 B- 树也没问题,更多时候是翻译成B 树。
B-Tree 中的
B
是什么的缩写?代表什么含义?
One of the inventors of the data structure, in an interview you can view in this video at about 16:05, said:
You just have no idea what a lunchtime conversation can turn into. So there we were, [indistinct] and I, at lunch , we had to give the thing a name. And we were, so, B, we were thinking… B is, you know… We were working for Boeing at the time, we couldn't use the name without talking to the lawyers. So, there is a B. It has to do with balance, another B. Bayer was the senior author, who did have several years older than I am and had many more publications than I did. So there is another B. And so, at the lunch table we never did resolve whether there was one of those that made more sense than the rest. What really lies to say is: the more you think about what the B in B-trees means, the better you understand B-trees.
简单来说,B
可以代表 Boeing,可以代表 Balance,可以代表 Bayer,也可以代表你认为可以代表的任何含义。
跟传统 BST 二叉搜索树的区别是什么?有了 BST 为什么还需要 B 树?
与传统的二叉搜索树不同,B 树的特点是可以在单个节点中存储大量键,这就是为什么它们也被称为“大键”(Large Key)树。B 树中的每个节点可以包含多个键,这使得树具有更大的分支因子,从而具有更浅的高度。这种浅高度会减少磁盘 I/O,从而加快搜索和插入操作的速度。 B 树特别适合具有缓慢且大量数据访问的存储系统,例如硬盘驱动器、闪存和 CD-ROM。
B 树是由 Rudolf Bayer 和 Edward M. McCreight 在波音研究实验室工作时发明的,目的是有效管理大型随机访问文件的索引页。基本假设是索引非常庞大,以至于主内存中只能容纳树的一小部分。Bayer 和 McCreight 的论文 Organization and maintenance of large ordered indices 于 1970 年 7 月首次流传,后来发表在 Acta Informatica 上。
根据 Knuth 的定义,m 阶 B 树(a B-tree of order m)是满足以下性质的树:
- 每个节点最多有 m 个子节点;
- 除根和叶外,每个节点至少有 ⌈m/2⌉ 个子节点;
- 根节点至少有两个子节点,除非它是叶子;
- 所有叶子都出现在同一水平面上;
- 具有 k 个子节点的非叶节点包含 k−1 个键。
在实际使用中,一般选择几阶的 B 树?阶树是越大越好吗?
实际业务中 B 树的阶数一般大于 100,存储大量数据,B 树高度也会很低,查询效率会更高。
三层的 B 树,阶数为 1024,最多容纳多少个元素?
节点总数:10241 + 10242 + 10243 ~= 11 亿个节点,假如每个节点放一个元素就是 11 亿个。所以在 10 亿个数据中找目标值,常规小于 3 次磁盘 IO 即可找到目标值,比平衡二叉树的 30 次提升了不少。
内部节点是除叶节点和根节点之外的所有节点。它们通常表示为一组有序的元素和子指针。
根节点的子节点数量与内部节点的上限相同,但没有下限。例如,当整棵树中的元素少于 L−1 时,根将是树中唯一没有子节点的节点。
根节点的子节点数量与内部节点的上限相同,但没有下限。
数据库存储引擎中使用 B 树实现的索引结构简单示意图:
- 搜索:
O(log n)
- 插入:
O(log n)
- 删除:
O(log n)
// A BTree
class Btree {
public BTreeNode root; // Pointer to root node
public int t; // Minimum degree
// Constructor (Initializes tree as empty)
Btree(int t)
{
this.root = null;
this.t = t;
}
// function to traverse the tree
public void traverse()
{
if (this.root != null)
this.root.traverse();
System.out.println();
}
// function to search a key in this tree
public BTreeNode search(int k)
{
if (this.root == null)
return null;
else
return this.root.search(k);
}
}
// A BTree node
class BTreeNode {
int[] keys; // An array of keys
int t; // Minimum degree (defines the range for number of keys)
BTreeNode[] C; // An array of child pointers
int n; // Current number of keys
boolean leaf; // Is true when node is leaf. Otherwise false
// Constructor
BTreeNode(int t, boolean leaf)
{
this.t = t;
this.leaf = leaf;
this.keys = new int[2 * t - 1];
this.C = new BTreeNode[2 * t];
this.n = 0;
}
...
}
// A function to search a key in the subtree rooted with this node.
BTreeNode search(int k)
{ // returns NULL if k is not present.
// Find the first key greater than or equal to k
int i = 0;
while (i < n && k > keys[i])
i++;
// If the found key is equal to k, return this node
if (keys[i] == k)
return this;
// If the key is not found here and this is a leaf
// node
if (leaf == true)
return null;
// Go to the appropriate child
return C[i].search(k);
}
Princeton key-value pairs B-Tree Demo:https://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/BTree.java.html
Aspect | B-Tree | B+ Tree |
---|---|---|
Structure | Separate leaf nodes for data storage and internal nodes for indexing | Nodes store both keys and data values |
Leaf Nodes | Leaf nodes form a linked list for efficient range-based queries | Leaf nodes do not form a linked list |
Order | Higher order (more keys) | Lower order (fewer keys) |
Key Duplication | Typically allows key duplication in leaf nodes | Usually does not allow key duplication |
Disk Access | Better disk access due to sequential reads in a linked list structure | More disk I/O due to non-sequential reads in internal nodes |
Applications | Database systems, file systems, where range queries are common | In-memory data structures, databases, general-purpose use |
Performance | Better performance for range queries and bulk data retrieval | Balanced performance for search, insert, and delete operations |
Memory Usage | Requires more memory for internal nodes | Requires less memory as keys and values are stored in the same node |
更多关于 B+ 树,传送门