上周看MIT, <<Introduction to algorithms>> 的时候, 觉得B tree 实现起来有点麻烦, 正好可以练习一下。
花了一天时间,晃晃悠悠, 终于写完了, 非常的没有效率啊。
逻辑不复杂,但很多分支。 关键是下标容易出错。 书上的删除方法不是很明了, 还费了些周折。 在动手写之前关键是明白B tree到底是怎么实现的。
我先实现了insert, 因为这个方法简单一些, 也借机加深对B tree的认识。 再实现的删除方法。
关键的注释在源代码(下载) 里面都有。insert写到一半, 发现我患了一个低级错误: 方法的传参混乱, 比如, insert()是在Node里面, 所以, 必须由一个Node来调用, 而方法里面又有要求传递一个Node作为参数。 这样一来的话, 一个node可以修改另一个node了。 如果要求一个node可以修改另一个node, 可以把方法设置为static, 如果不用, 那么参数就重复传递了。 但我没有改过来, 后面所有的方法都是这么一个模式:
node.method(node, ...)
我只是做了简单的测试, insert方法是依次插入值[1,18], 每插入一个值, 和参考【1】的树比较, 看是否一样。 delete的测试是首先构建一个树, 包含值[1,18], 然后从18到1, 依次删除并验证。
手都酸了。。
参考:
2. Collection of BTree info
3. MIT <<Introduction to algorithms>>
附, 参考3上提供的方法, 我加了一些注释:
Insert
B-TREE-SPLIT-CHILD(x, i, y)
1 z ← ALLOCATE-NODE()
2 leaf[z] ← leaf[y]
3 n[z] ← t - 1
4 for j ← 1 to t - 1
5 do keyj[z] ← keyj+t[y]
6 if not leaf [y]
7 then for j ← 1 to t
8 do cj[z] ← cj+t[y]
9 n[y] ← t - 1
10 for j ← n[x] + 1 downto i + 1
11 do cj+1[x] ← cj [x]
12 ci+1[x] ← z
13 for j ← n[x] downto i
14 do keyj+1[x] ← keyj[x]
15 keyi[x] ← keyt[y]
16 n[x] ← n[x] + 1
17 DISK-WRITE(y)
18 DISK-WRITE(z)
19 DISK-WRITE(x)
B-TREE-INSERT(T, k)
1 r ← root[T]
2 if n[r] = 2t - 1
3 then s ← ALLOCATE-NODE()
4 root[T] ← s
5 leaf[s] ← FALSE
6 n[s] ← 0
7 c1[s] ← r
8 B-TREE-SPLIT-CHILD(s, 1, r)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)
B-TREE-INSERT-NONFULL(x, k)
1 i ← n[x]
2 if leaf[x]
3 then while i ≥ 1 and k < keyi[x]
4 do keyi+1[x] ← keyi[x]
5 i ← i - 1
6 keyi+1[x] ← k
7 n[x] ← n[x] + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < keyi[x]
10 do i ← i - 1
11 i ← i + 1
12 DISK-READ(ci[x])
13 if n[ci[x]] = 2t - 1
14 then B-TREE-SPLIT-CHILD(x, i, ci[x])
15 if k> keyi[x]
16 then i ← i + 1
17 B-TREE-INSERT-NONFULL(ci[x], k)
Deletion
There are two special cases to consider when deleting an element:
- the element in an internal node may be a separator for its child nodes
- deleting an element may put it under the minimum number of elements and children.
Algorithm
-
If the key k is in node x and x is a leaf, delete the key k from x.
-
If the key k is in node x and x is an internal node, do the following.
-
If the child y that precedes k in node x has at least t keys, then find the predecessor k′ of k in the subtree rooted at y. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the largest key of the left subtree (??????If y is a leaf within t keys, after the deletion, y has t - 1 keys. Then, it's possible that an element is deleted from y next time, which result in y 's key size to be t - 2, ??? see rule 3)
-
Symmetrically, if the child z that follows k in node x has at least t keys, then find the successor k′ of k in the subtree rooted at z. Recursively delete k′, and replace k by k′ in x. (Finding k′ and deleting it can be performed in a single downward pass.), that is, replace k with the smallest key of the right subtree
-
Otherwise, if both y and z have only t - 1 keys, merge k and all of z into y, so that x loses both k and the pointer to z, and y now contains 2t - 1 keys. Then, free z and recursively delete k from y. that is, merge the children, that is, merge the two children
-
-
If the key k is not present in internal node x, determine the root ci[x] of the appropriate subtree that must contain k, if k is in the tree at all. If ci[x] has only t - 1 keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least t keys. Then, finish by recursing on the appropriate child of x.(while traversing down)
-
If ci[x] has only t - 1 keys but has an immediate sibling with at least t keys, give ci[x] an extra key by moving a key from x down into ci[x], moving a key from ci[x]'s immediate left or right sibling up into x, and moving the appropriate child pointer from the sibling into ci[x].
-
If ci[x] and both of ci[x]'s immediate siblings have t - 1 keys, merge ci[x] with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node.
-
------borrow an element from the sibling, otherwise, merge the sibling and the key between the sibling and ci[x]. that is, to ensure the lower bound of every node(the special case 2)