Berkeley DB 源代码分析 (3) --- Btree的实现 (2)

最新推荐文章于 2017-03-06 16:26:00 发布

david_zhao_wei

最新推荐文章于 2017-03-06 16:26:00 发布

阅读量1.4k

点赞数

分类专栏：数据库技术 Berkeley DB 文章标签：代码分析 search pair delete insert tree

本文链接：https://blog.csdn.net/smartpig_zw/article/details/7392273

版权

本文详细分析了Berkeley DB中Btree的数据存储方式，包括如何存储重复键/数据对，以及在删除操作中如何处理。在Btree中，重复键只在页面上存储一次，而每个键的多个数据项会存储在页面上。删除操作中，不会立即从页面中移除键，而是标记为已删除，直到关闭并指向已删除键/数据对的游标。文章还讨论了插入、查找和删除的具体实现细节，以及如何处理重复数据和空间利用。

摘要由CSDN通过智能技术生成

__bam_ditem

In btree we store on-page duplicate key/data pairs this way:
1. we only put the key onto the page once, since it's duplicated, there is no meaning putting
identical keys multiple times. and we put each of the dup keys' data items
onto the page;

2. In the index array, there are multiple index element for this
dup key pointing at the same key's offset. and since the index array is sorted by
the keys the elements point at, index element to the same dup keys are continuous, like
indx[i], indx[i+1] and indx[i+2] point at the same key value on the page who
has 3 dup key/data pairs.

so when deleting the key indx[i+1], we don't remove the key from page since there are
still indx[i] and indx[i+2] pointing at the key. we simply move elements after
indx[i+1] one element forward, and then we will have indx[i] and indx[i+1]
pointing at that key, and thus we will have two dup key/data pairs. When
deleting a key/data pair of btree leaf page, we do it twice, first delete the
key then delete the data item -- the order can't be reversed.

Deleting key/data pairs

1. In DBC->del, we only mark the key/data pair deleted (B_DELETE), and mark
the cursor to be pointing to a deleted k/d pair(C_DELETED), but we don't
effectively remove the k/d from page, unless the cursor is closed and it's
pointing to a deleted k/d. In this special case we will remove the single k/d
pair it points to. After a data item is marked deleted, it can be internally
found/located by search functions, but never returned to user. The space it
takes can be overwritten, when inserting a k/d which should be located at
exactly the same page and location.

Thus, if we use DB->del to delete a k/d, it's immediately deleted from db; if
we use DBC->del to iterate the db and del each k/d, none except the last one
is removed from db. This can avoid frequent tree structure change
(split/rsplit), which are expensive operations, but also waste a lot of space
potentially.

I think we should add a DB_FORCE flag for DBC->close and when it's specified
we know no other cursor is pointing on the k/d, thus when our cursor is about
to move away from current page to another page, we delete all k/d pairs marked
B_DELETE. We don't remove on each DBC->del call because it would make the
cursor movement operations harder to implement.

__db_ret
Return a specified key/data pair, given the page pointer(which was locked and
fetched from mpool already), pgno and index.

__bam_getboth_finddatum

works for DB_GET_BOTH and DB_GET_BOTH_RANGE flags in DB/DBC->get.

If DB_DUP is set but DB_DUPSORT is not set, in which case dbp->dup_compare is
null, we do a linear search, and only look for exact match even RANGE is
specified, i.e. RANGE is identical to GET_BOTH if not DUPSORT, which is
undocumented.

Otherwise both DB_DUP and DB_DUPSORT are specified, and we do a binary search
on the leaf page. __bamc_search does the btree search in the opd, not this
one.

__bamc_put

In btree we can't specify where to store a k/d because its stored according to
k's value and d's value. The only exception to this rule is when the btree
allows dup(DB_DUP set) but doesn't allow sorted dup(!DB_DUPSORT), and in this
case we can specify to insert a data item before or after(DB_BEFORE/DB_AFTER)
the cursor's current pointed key/data pair as a dup data item for the same
key.

Other flags like DB_OVERWRITE_DUP, DB_NODUPDATA and DB_NOOVERWRITE all
controls how to deal with dup data items rather than control movement or pos
o