MIT6.830 lab5 B+ Tree Index 实验报告

最新推荐文章于 2023-02-03 14:44:31 发布

跳着迪斯科学Java

最新推荐文章于 2023-02-03 14:44:31 发布

阅读量1.8k

点赞数 3

分类专栏： 6.830 文章标签：数据库 java

本文链接：https://blog.csdn.net/weixin_45834777/article/details/121209402

版权

本文是关于MIT6.830实验室的B+树索引实验报告，涵盖了查询、插入和删除操作的实现。实验中，你需要理解B+树的结构，包括根节点、内部节点、叶子节点和头部节点。实验要求实现查找功能，从给定的key值找到适当的叶子节点，以及处理节点分裂、元组重分布和页面合并的情况。通过这个实验，你可以深入理解B+树的查找、插入和删除过程及其在数据库索引中的应用。

摘要由CSDN通过智能技术生成

一、实验概览

lab5主要是实现B+树索引，主要有查询、插入、删除等功能，查询主要根据B+树的特性去递归查找即可，插入要考虑节点的分裂（节点tuples满的时候），删除要考虑节点内元素的重新分配（当一个页面比较空，相邻页面比较满的时候），兄弟节点的合并（当相邻两个页面的元素都比较空的时候），以上就是本实验要实现的大致内容。

In this lab you will implement a B+ tree index for efficient lookups and range
scans. We supply you with all of the low-level code you will need to implement
the tree structure. You will implement searching, splitting pages,
redistributing tuples between pages, and merging pages.

（查找，分裂页，重新分配元组，合并页）

You may find it helpful to review sections 10.3–10.7 in the textbook, which
provide detailed information about the structure of B+ trees as well as
pseudocode for searches, inserts and deletes.

As described by the textbook and discussed in class, the internal nodes in B+
trees contain multiple entries, each consisting of a key value and a left and a
right child pointer. Adjacent keys share a child pointer, so internal nodes
containing m keys have m+1 child pointers. Leaf nodes can either contain
data entries or pointers to data entries in other database files. For
simplicity, we will implement a B+tree in which the leaf pages actually contain
the data entries. Adjacent leaf pages are linked together with right and left
sibling pointers, so range scans only require one initial search through the
root and internal nodes to find the first leaf page. Subsequent leaf pages are
found by following right (or left) sibling pointers.

实验前，需要理清整个B+树的结构。B+的页面节点类型主要有四种：

1.根节点页面：一个B+树的根节点，在SimpleDB中实现为BTreeRootPtrPage.java;

2.内部节点页面：除去根节点和叶子节点外的节点，在SimpleDB中实现为BTreeInternalPage，每个BTreeInternalPage由一个一个的entry组成；

3.叶子节点页面：存储tuple的叶子节点，在SimpleDB中实现为BTreeLeafPage；

4.头部节点页面：用于记录整个B+树中的一个页面的使用情况，在SimpleDB中实现为BTreeHeaderPage。

同时，四种页面使用PageId为区分：

二、实验过程

1.Search

给定一个field和一个page，要从这个page往下递归找到tuple在的叶子节点。

Your first job is to implement the findLeafPage() function in
BTreeFile.java. This function is used to find the appropriate leaf page given
a particular key value, and is used for both searches and inserts. For example,
suppose we have a B+Tree with two leaf pages (See Figure 1). The root node is an
internal page with one entry containing one key (6, in this case) and two child
pointers. Given a value of 1, this function should return the first leaf page.
Likewise, given a value of 8, this function should return the second page. The
less obvious case is if we are given a key value of 6. There may be duplicate
keys, so there could be 6’s on both leaf pages. In this case, the function
should return the first (left) leaf page.

Exercise 1: BTreeFile.findLeafPage()

Implement BTreeFile.findLeafPage().

After completing this exercise, you should be able to pass all the unit tests
in BTreeFileReadTest.java and the system tests in BTreeScanTest.java.

这部分主要根据讲义的提示来做，主要实现思路如下：

1.获取数据页类型；

2.判断该数据页是否为叶子节点，如果是则递归结束，将该页面返回；

3.如果不是则说明该页面是内部节点，将页面进行类型转换；

4.获取内部节点的迭代器；

5.对内部节点的entry进行迭代，这里要主要field是空的处理，如果是空直接找到最左的叶子页面即可；

6.找到第一个大于（或等于）filed的entry，然后递归其左孩子；

7.如果到了最后一个页面，则递归其右孩子；

这里要对B+树的查找过程有一些概念，然后另外要注意的是读写权限的控制，根据这个权限lab4实现的事务会加不同的锁。实现代码如下：

	private BTreeLeafPage findLeafPage(TransactionId tid, Map<PageId, Page> dirtypages, BTreePageId pid, Permissions perm,
                                       Field f)
					throws DbException, TransactionAbortedException {
   
		// some code goes here

		//1.获取数据页类型
		int type = pid.pgcateg();
		//2.如果是leaf page，递归结束，说明找到了
		if (type == BTreePageId.LEAF) return (BTreeLeafPage)getPage(tid, dirtypages, pid, perm);
		//3.读取internal page要使用READ_ONLY perm
		BTreeInternalPage internalPage = (BTreeInternalPage)getPage(tid, dirtypages, pid, Permissions.READ_ONLY);
		//4.获取该页面的entries
		Iterator<BTreeEntry> it = internalPage.iterator();
		//这里需要把entry声明在循环外，如果找到最后一个entry还没找到，返回最后一个entry的右孩子
		BTreeEntry entry = null;
		while (it.hasNext()) {
   
			entry = it.next();
			if (f == null) return findLeafPage(tid, dirtypages, entry.getLeftChild(), perm, f);
			Field key = entry.getKey();
			if (key.compare(Op.GREATER_THAN_OR_EQ, f)) return findLeafPage(tid, dirtypages, entry.getLeftChild(), perm, f);
		}
		return findLeafPage(tid, dirtypages, entry.getRightChild(), perm, f);
	}

测试用例：

B+树索引查找的过程：

1.创建运算符，因为该B+树只支持单列索引，运算符只有大于，小于，等于，大于等于，小于等于，不等于：

IndexPredicate ipred = new IndexPredicate(Op.GREATER_THAN, f);

2.调用BTreeFile的indexIterator方法获取查找结果,indexIterator方法是会创建BTreeSearchIterator迭代器：

	DbFileIterator it = twoLeafPageFile.indexIterator(tid, ipred);

	public DbFileIterator indexIterator(TransactionId tid, IndexPredicate ipred) {
   
		return new BTreeSearchIterator(this, tid, ipred);
	}

3.在需要获取查找结果时，会调用BTreeSearchIterator的open和getnext方法来获取查询的结果：

4.首先是open，开启迭代器。首先是getPage获取页面，这里会加锁，然后第一次调用会从BTreeFile.getPage()获取根节点，因为写入文件时根节点是按内部节点的类型去写的，然后每个根节点有9个entry，第一次遍历实际上是遍历了根节点的9个entry然后往下查找，当然这里只是找出了叶子节点页面并创建了迭代器，真正的查找在下一步。

	public void open() throws DbException, TransactionAbortedException {
   
		BTreeRootPtrPage rootPtr = (BTreeRootPtrPage) Database.getBufferPool().getPage(
				tid, BTreeRootPtrPage.getId(f.getId()), Permissions.READ_ONLY);
		BTreePageId root = rootPtr.getRootId();
		if(ipred.getOp() == Op.EQUALS || ipred.getOp() == Op.GREATER_THAN 
				|| ipred.getOp() == Op.GREATER_THAN_OR_EQ) {
   
			curp = f.findLeafPage(tid, root, ipred.getField());
		}
		else {
   
			curp = f.findLeafPage(tid, root, null);
		}
		it = curp.iterator(<