DSAA补充之B树、B+树

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/LoveStackover/article/details/80690712

1. 回顾

  以前记录的B树DSAA之B-tree(六),这个定义真是有点四不像啊。是B+树的感觉,又没有底层的link。所以今天有必要重新审视下B树和B+树的概念。本文内容摘自维基百科

2. B树

定义

According to Knuth’s definition, a B-tree of order m is a tree which satisfies the following properties:

  • Every node has at most m children.
  • Every non-leaf node (except root) has at least ⌈m/2⌉ children.这个符号是向上取整的意思
  • The root has at least two children if it is not a leaf node.
  • A non-leaf node with k children contains k−1 keys.非子叶节点包含k-1个关键字,k为该节点的子代数目
  • All leaves appear in the same level

Each internal node’s keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.内部节点(非子叶节点)的关键字索引了它的所有子代,如[a1,a2]索引该节点的三个子代。

  • Internal nodes
    • Internal nodes are all nodes except for leaf nodes and the root node.
    • They are usually represented as an ordered set of elements and child pointers. Every internal node contains a maximum of U children and a minimum of L children. Thus, the number of elements is always 1 less than the number of child pointers (the number of elements is between L−1 and U−1).这个可以简单理解为内部节点的关键字的数目是该节点的子代数目-1,当然也可以直接理解:U为m,L为[m/2]向上取整
    • U must be either 2L or 2L−1; therefore each internal node is at least half full. The relationship between U and L implies that two half-full nodes can be joined to make a legal node, and one full node can be split into two legal nodes (if there’s room to push one element up into the parent). These properties make it possible to delete and insert new values into a B-tree and adjust the tree to preserve the B-tree properties.删除,插入操作会导致merge或者split,这个在以前的Btree记录帖子也有提到
  • The root node
    • The root node’s number of children has the same upper limit as internal nodes, but has no lower limit. root节点的子代不受[m/2]的限制
      • For example, when there are fewer than L−1 elements in the entire tree, the root will be the only node in the tree with no children at all.
  • Leaf nodes
    • Leaf nodes have the same restriction on the number of elements, but have no children, and no child pointers.

  上面这些性质都是之前见过的,但是有一点很重要:

B-trees keep values in every node in the tree, and may use the same structure for all nodes. However, since leaf nodes never have children, the B-trees benefit from improved performance if they use a specialized structure.

例子

  图片来自侵权自删
这里写图片描述
  这里与之前学习的B树定义有区别,DSAA中按照B+树的方式定义的,这里强调了每个节点都包含了key值,且每个节点的key值都是有序且唯一的。

最大深度

Let h be the height of the classic B-tree. Let n > 0 be the number of entries in the tree. d be the minimum number of children an internal (non-root) node can have. For an ordinary B-tree, d=m/2.

hlogd(n+12)

  可以直接当结论记忆,推导过程在《Introduction to Algorithms》中有详细论述。

3. B+树

  因为B+树是B树的改进,所以可以通过两颗树的区别来学习B+树,以下来自Differences between B trees and B+ trees

这里写图片描述

  • Advantages of B+ trees:
    • Because B+ trees don’t have data associated with interior nodes, more keys can fit on a page of memory. Therefore, it will require fewer cache misses in order to access data that is on a leaf node.理解这个有点需要先了解缓存机制
    • The leaf nodes of B+ trees are linked, so doing a full scan of all objects in a tree requires just one linear pass through all the leaf nodes.
      • A B tree, on the other hand, would require a traversal of every level in the tree. This full-tree traversal will likely involve more cache misses than the linear traversal of B+ leaves.
  • Advantage of B trees:
    • Because B trees contain data with each key, frequently accessed nodes can lie closer to the root, and therefore can be accessed more quickly.

  总结下B+树就是从磁盘访问性能,和支持顺序查找两方面改进了B树。当然两者都支持随机查找。另外对于是否当前节点的子代数目和key值数目保持一致,还是两者相差1的关系存在争议。无论是如何实现,只要保证前后一致就行。在维基百科和stackoverflow的引用中都是按照B树的特点来介绍B+树的。

4. 两个笔试题

这里写图片描述
这里写图片描述

  第一题A选择B树不支持顺序查找,第二题根据B树的root节点作为单节点树时,子代数为0,其他情况root的子代树至少为2,存储的key至少为1。假设所有的节点都存储了1个key值,此时按照满二叉树处理为:2^k-1=2047,k=11。如果按照维基百科上面的公式计算得到k=10,但是因为从k从零开始,所以k=10+1=11。然后这里出现歧义有的定义中B树叶子层不包含数据,上面维基百科和stackoverflow的引用遵循叶子层包含数据的定义。不同的定义下A或者B都可能正确。

阅读更多
想对作者说点什么? 我来说一句
相关热词

没有更多推荐了,返回首页