T-tree原理与实现

T-tree

In computer science a T-tree is a type of binary tree data structure that is used by main-memory databases, such as Datablitz, eXtremeDB, MySQL Cluster, Oracle TimesTen and KairosMobileLite.

A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on block oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them.

T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.

The 'T' in T-tree refers to the shape of the node data structures in the original paper that first described this type of index.[1]

Contents

[hide]

[edit] Performance

Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware:

Rao, Jun; Kenneth A. Ross (1999). "Cache conscious indexing for decision-support in main memory". Proceedings of the 25th International Conference on Very Large Databases (VLDB 1999). Morgan Kaufmann. pp. 78–89. http://www.vldb.org/dblp/db/conf/vldb/RaoR99.html. 

Kim, Kyungwha; Junho Shim, and Ig-hoon Lee (2007). "Cache conscious trees: How do they perform on contemporary commodity microprocessors?". Proceedings of the 5th International Conference on Computational Science and Its Applications (ICCSA 2007). Springer. pp. 189–200. doi:10.1007/978-3-540-74472-6_15. 

The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access.

[edit] Node structures

A T-tree node usually consists of pointers to the parent node, the left and right child node, an ordered array of data pointers and some extra control data. Nodes with two subtrees are called internal nodes, nodes without subtrees are called leaf nodes and nodes with only one subtree are named half-leaf nodes. A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively.

Bound values.

For each internal node leaf or half leaf nodes exist that contain the predecessor of its smallest data value (called the greatest lower bound) and one that contains the successor of its largest data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of data elements from one to the maximum size of the data array. Internal nodes keep their occupancy between predefined minimum and maximum numbers of elements

[edit] Algorithms

[edit] Search

  • Search starts at the root node
  • If the current node is the bounding node for the search value then search its data array. Search fails if the value is not found in the data array.
  • If the search value is less than the minimum value of the current node then continue search in its left subtree. Search fails if there is no left subtree.
  • If the search value is greater than the maximum value of the current node then continue search in its right subtree. Search failes if there is no right subtree.

[edit] Insertion

  • Search for a bounding node for the new value. If such a node exist then
    • check whether there is still space in its data array, if so then insert the new value and finish
    • if no space is available then remove the minimum value from the node's data array and insert the new value. Now proceed to the node holding the greatest lower bound for the node that the new value was inserted to. If the removed minimum value still fits in there then add it as the new maximum value of the node, else create a new right subnode for this node.
  • If no bounding node was found then insert the value into the last node searched if it still fits into it. In this case the new value will either become the new minimum or maximum value. If the value doesn't fit anymore then create a new left or right subtree.

If a new node was added then the tree might need to be rebalanced, as described below.

[edit] Deletion

  • Search for bounding node of the value to be deleted. If no bounding node is found then finish.
  • If the bounding node does not contain the value then finish.
  • delete the value from the node's data array

Now we have to distinguish by node type:

  • Internal node:

If the node's data array now has less than the minimum number of elements then move the greatest lower bound value of this node to its data value. Proceed with one of the following two steps for the half leaf or leaf node the value was removed from.

  • Leaf node:

If this was the only element in the data array then delete the node. Rebalance the tree if needed.

  • Half leaf node:

If the node's data array can be merged with its leaf's data array without overflow then do so and remove the leaf node. Rebalance the tree if needed.

[edit] Rotation and balancing

A T-tree is implemented on top of an underlying self-balancing binary search tree. Specifically, Lehman and Carey's article describes a T-tree balanced like an AVL tree: it becomes out of balance when a node's child trees differ in height by at least two levels. This can happen after an insertion or deletion of a node. After an insertion or deletion, the tree is scanned from the leaf to the root. If an imbalance is found, one tree rotation or pair of rotations is performed, which is guaranteed to balance the whole tree.

When the rotation results in an internal node having fewer than the minimum number of items, items from the node's new child(ren) are moved into the internal node.

  

典型T-Tree图

 

 

 

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
树状数组是一种用于快速维护数组前缀和的数据结构,其时间复杂度为 $O(logn)$。树状数组的实现基于二进制的思想,借助树状数组可以高效地进行单点更新和区间查询等操作。以下是树状数组的实现原理: 1. 数组的每个元素代表原数组中某个位置的前缀和,如下所示: $$C_i = A_1 + A_2 + \cdots + A_i$$ 其中 $C_i$ 表示原数组 $A$ 的前 $i$ 项之和。 2. 将 $C$ 数组转换成树状数组 $T$,树状数组的每个节点表示其父节点到该节点的元素和。如下所示: $$T_i = C_{i-lowbit(i)+1} + C_{i-lowbit(i)+2} + \cdots + C_i$$ 其中 $lowbit(i)$ 表示 $i$ 的二进制表示中最低位的 $1$ 所代表的值。 3. 树状数组的单点更新操作只需要将 $i$ 位置的值加上 $k$,然后依次更新其祖先节点即可: ```cpp void update(int i, int k, int n, vector<int>& tree) { while (i <= n) { tree[i] += k; i += lowbit(i); } } ``` 其中 $n$ 表示原数组 $A$ 的大小,$tree$ 表示树状数组。 4. 树状数组的区间查询操作只需要通过两次前缀和的差值计算出区间和即可: ```cpp int query(int i, vector<int>& tree) { int sum = 0; while (i > 0) { sum += tree[i]; i -= lowbit(i); } return sum; } int query(int i, int j, vector<int>& tree) { return query(j, tree) - query(i - 1, tree); } ``` 其中第一个查询函数用于计算 $A_1 + A_2 + \cdots + A_i$,第二个查询函数用于计算 $A_{i+1} + A_{i+2} + \cdots + A_j$,然后两者相减即可得到区间 $[i,j]$ 的和。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值